Skip to main content
Applied and Environmental Microbiology logoLink to Applied and Environmental Microbiology
. 2021 Aug 11;87(17):e00467-21. doi: 10.1128/AEM.00467-21

Legionella pneumophila CRISPR-Cas Suggests Recurrent Encounters with One or More Phages in the Family Microviridae

Shayna R Deecker a, Malene L Urbanus a,, Beth Nicholson a, Alexander W Ensminger a,b,
Editor: M Julia Pettinaric
PMCID: PMC8357283  PMID: 34132590

ABSTRACT

Legionella pneumophila is a ubiquitous freshwater pathogen and the causative agent of Legionnaires’ disease. L. pneumophila growth within protists provides a refuge from desiccation, disinfection, and other remediation strategies. One outstanding question has been whether this protection extends to phages. L. pneumophila isolates are remarkably devoid of prophages and to date no Legionella phages have been identified. Nevertheless, many L. pneumophila isolates maintain active CRISPR-Cas defenses. So far, the only known target of these systems is an episomal element that we previously named Legionella mobile element 1 (LME-1). The continued expansion of publicly available genomic data promises to further our understanding of the role of these systems. We now describe over 150 CRISPR-Cas systems across 600 isolates to establish the clearest picture yet of L. pneumophila’s adaptive defenses. By searching for targets of 1,500 unique CRISPR-Cas spacers, LME-1 remains the only identified CRISPR-Cas targeted integrative element. We identified 3 additional LME-1 variants—all targeted by previously and newly identified CRISPR-Cas spacers—but no other similar elements. Notably, we also identified several spacers with significant sequence similarity to microviruses, specifically those within the subfamily Gokushovirinae. These spacers are found across several different CRISPR-Cas arrays isolated from geographically diverse isolates, indicating recurrent encounters with these phages. Our analysis of the extended Legionella CRISPR-Cas spacer catalog leads to two main conclusions: current data argue against CRISPR-Cas targeted integrative elements beyond LME-1, and the heretofore unknown L. pneumophila phages are most likely lytic gokushoviruses.

IMPORTANCE Legionnaires’ disease is an often-fatal pneumonia caused by Legionella pneumophila, which normally grows inside amoebae and other freshwater protists. L. pneumophila trades diminished access to nutrients for the protection and isolation provided by the host. One outstanding question is whether L. pneumophila is susceptible to phages, given the protection provided by its intracellular lifestyle. In this work, we use Legionella CRISPR spacer sequences as a record of phage infection to predict that the “missing” L. pneumophila phages belong to the microvirus subfamily Gokushovirinae. Gokushoviruses are known to infect another intracellular pathogen, Chlamydia. How do gokushoviruses access L. pneumophila (and Chlamydia) inside their “cozy niches”? Does exposure to phages happen during a transient extracellular period (during cell-to-cell spread) or is it indicative of a more complicated environmental lifestyle? One thing is clear, 100 years after their discovery, phages continue to hold important secrets about the bacteria upon which they prey.

KEYWORDS: Legionella pneumophila, CRISPR-Cas, spacer target, LME-1, gokushovirus, microvirus, phage, bacteriophages

INTRODUCTION

Legionella pneumophila is a Gram-negative, intracellular bacterium that is ubiquitous in freshwater environments (13), where it replicates within a wide range of protist hosts (4). If a contaminated water source becomes aerosolized and inhaled, L. pneumophila can infect human lung macrophages and cause a severe pneumonia known as Legionnaires’ disease (57). Replication in the accidental human host uses similar molecular strategies to those used to infect protists (8). As humans are an evolutionary dead end to the pathogen (9), understanding how L. pneumophila is able to persist and replicate in environmental reservoirs is critical to limiting its ability to cause human disease.

Protozoan hosts not only serve as a replicative niche to L. pneumophila (1, 4), but also provide protection from desiccation (10), temperature changes (11, 12), and disinfectants (1114). One outstanding question is whether these hosts also protect L. pneumophila from invasion by foreign genetic elements, such as bacteriophages (phages). Notably, phages for the obligate intracellular pathogens Chlamydia psittaci and Chlamydia pecorum have been described (1518), raising the possibility that phages may access L. pneumophila even within the protection of the host.

Despite promising early reports, the published record does little to settle the question as to whether Legionella can be infected by lytic phages. In one report, preliminary analysis suggested that Legionella phages of the Myoviridae family could be isolated from freshwater samples (19), but inefficient phage enrichment prevented the preservation of stocks for validation and further study. Around the same time, another group reported visualization of temperate Legionella phage, but this occurred in a well-characterized, fully sequenced strain with no potential to generate phage particles (20). Now, a decade later, no subsequent studies have confirmed either laboratory’s findings or identified any other type of L. pneumophila phage. Lysogeny also seems uncommon in the Legionella genus, as the only known “prophage-like” elements described to date are Legionella mobile element 1 (LME-1), which has been proposed to descend from an ancestral phage (21, 22), and a putative prophage in one Legionella micdadei isolate (23).

Many L. pneumophila isolates maintain active and adaptive CRISPR-Cas systems (22, 2427), providing another avenue by which to explore the species’ relationship with phage. Because L. pneumophila CRISPR-Cas protects against foreign threats, the CRISPR array in each L. pneumophila genome serves as a sequence-based diary of past environmental encounters. When a bacterium encounters a foreign element (e.g., a plasmid or a phage), a short DNA sequence (spacer) can be obtained and incorporated into a CRISPR array (2831). The transcribed CRISPR RNA forms a complex with associated Cas proteins and uses complementary base-pairing and endonuclease activity to cleave the foreign element and protect the bacterium (3235). Given the sequence-based functionality of CRISPR-Cas systems, sequence similarity can be applied to identify targets of these systems for individual bacterial species (22, 36) and in a global context (3740).

We previously used spacer sequence similarity to identify the first known target of L. pneumophila CRISPR-Cas systems, a mobile genetic element known as LME-1 (22). In the same study, we showed that LME-1 restricts host range and that L. pneumophila CRISPR-Cas successfully defends against transfer of the element between strains (22). However, we found that only ∼3% of L. pneumophila spacers match to LME-1 (22), leaving the vast majority of spacers without defined targets. In this study, we have expanded our analyses to include a collection of over 600 L. pneumophila isolates and 1,500 unique spacers. Leveraging this expanded data set, we identify only two recurrent targets of L. pneumophila CRISPR-Cas: LME-1 and phages belonging to the microvirus subfamily Gokushovirinae.

RESULTS AND DISCUSSION

Legionella integrated elements identified by prophage prediction.

The only target of L. pneumophila CRISPR-Cas identified to date is a ∼30-kb integrated mobile element, LME-1 (22). It has been previously reported that LME-1 contains a number of predicted proteins of phage origin (21, 22), so we decided to revisit whether variants of LME-1 or other integrated L. pneumophila phage-like elements could be identified using established prophage prediction tools. We subjected a set of over 600 publicly available L. pneumophila genomes (Table S1 in the supplemental material) to analysis by two complementary prophage prediction programs, PhiSpy (41) and VirSorter (42), which use different methods to identify potential phage-like sequences. PhiSpy predictions are based on a variety of metrics, such as AT/GC skew, phage sequence signatures, strand directionality, and the presence of potential attachment sites (41). VirSorter identifies viral sequences by similarity to phage proteins (hmmsearch and BLASTp) and metrics associated with virus-like genome structures, such as the presence of hallmark viral genes, enrichment in uncharacterized genes, and depletion in strand switching between consecutive genes (42).

Of 35 predictions that overlapped between VirSorter and PhiSpy, 7 were LME-1 variants that were flanked by the previously described LME-1 attachment site and containing nearly identical gene content (Table S2). Each of the remaining 28 predicted regions contained csrT, a regulatory gene which is coinherited with type IV secretion systems from Legionella integrative conjugative elements (ICEs) (43) (Table S2). By CsrT phylogeny, these ICEs all fall into the previously described group IV Lgi elements (Fig. S1) (43).

To determine if any additional LME-1 variants might have been missed by our analyses, we searched each L. pneumophila genome for a signature of LME-1 integration, i.e., the presence of two LME-1 attachment (att) sites. This analysis identified 8 att-flanked insertions of >29 kb (Table S3). These integrants corresponded to all the previously identified LME-1 variants above, along with one additional LME-1 variant missed by PhiSpy. We also identified 24 instances of a short, 469-bp insertion at the LME-1 att site (Table S3). This insertion contains a predicted open reading frame that is 110 amino acids in size, shares no sequence similarity to LME-1 or phages, and is predicted to be an L. pneumophila hypothetical protein by BLASTp (44). Notably, unlike LME-1 insertions, the second att site for these instances contains several substitutions and insertions, suggesting against a contemporary integration event.

We next examined this expanded set of LME-1 variants for differences in gene content and sequence similarity to phage genes. Based on core genome phylogeny, the LME-1 sequences segregate into 5 distinct variants with a set of 32 core genes and 17 accessory genes (Fig. 1, Table S4). We examined each LME-1 predicted gene product for similarity to phage proteins using BLASTp and HHpred (44, 45). Strikingly, 27/32 of the core LME-1 genes share similarity to phage proteins (and 6/17 accessory genes). The preponderance of phage-like genes in the LME-1 core (including key phage structural proteins) clearly points to a phage origin (21, 22). It also raises the intriguing possibility that these sequences may generate phage-like particles under one or more heretofore untested experimental conditions.

FIG 1.

FIG 1

Core orthologous genes in LME-1 enriched for sequence similarity to known phage genes. Orthologous groups for LME-1 genes were determined using OrthoMCL. The core LME-1 genomes were aligned with MUSCLE; a phylogeny was constructed using FastTree and subsequently visualized in GGTREE. Orthologous group presence/absence in each LME-1 genome was visualized with ggplot2. Red dots adjacent to each orthologous group indicate sequence similarity to known phage genes via BLASTp and/or HHpred predictions (Table S4).

Cataloging the scope of L. pneumophila CRISPR-Cas defenses.

Based on our analysis of 18 L. pneumophila isolates, we previously identified LME-1 as the only recurrent target of L. pneumophila CRISPR-Cas (22). Leveraging the availability of a drastically expanded genome set, we revisited our analysis of L. pneumophila CRISPR-Cas to determine if one or more additional targets might emerge. To begin, we systematically searched for CRISPR-Cas systems in over 600 L. pneumophila isolates (Table S1, Fig. S2A) using CRISPRCasFinder (46) and CRISPRDetect (47). Combined with previously identified systems (22, 24, 25, 27, 48), we identified a total of 13 isolates with type I-C CRISPR systems, 47 with type I-F systems, and 108 with type II-B systems. Many of these systems contain arrays with conserved spacer content and variance in spacer loss patterns and/or 5′ spacers (Table S5, I-C and I-F). These patterns are reflective of shared ancestry and can be leveraged to identify recent acquisition events from contemporary sources (22, 27, 31). Combined, these systems contain a total of 1,589 unique spacers (Table S5), a 4-fold increase relative to our earlier search for targets of L. pneumophila CRISPR-Cas (22). Nevertheless, rarefaction curves clearly show the continued value of further sequencing, especially for genomes containing type I-C systems (Fig. S2B). This is consistent with our previous observations that at least two of these systems (I-C and I-F) are adaptive (22, 27), suggesting that additional sampling of spacer sequences will continue to reveal new contemporary encounters between L. pneumophila and its invaders.

LME-1 is the only known integrative element targeted by Legionella CRISPR-Cas systems.

With our expanded spacer library, we next sought to identify the matching protospacer sequences targeted by these systems. Our previous analysis, limited in both spacer sequences (queries) and L. pneumophila genomes (to search), nevertheless identified two variants of LME-1 as recurrent targets of all 3 types of L. pneumophila CRISPR-Cas (22). Backed by our prophage analysis above, we sought to determine if LME-1 remained uniquely targeted relative to other integrated genomic elements. We queried our expanded spacer set against a local BLAST database of over 600 L. pneumophila genomic sequences to search for any integrated elements that overlap with our prophage analysis or may have been missed (Table 1, Fig. S2C) (49, 50).

TABLE 1.

Target databases used for spacer target searches for integrated elements and exogenous elements

Database name Target type Date downloaded No. of genomes, reads, or contigs Genome completion status Reference(s)
European Nucleotide Archive - Legionella pneumophila Integrated elements 2019-01-09 519 Draft 50
NCBI - Legionella pneumophila Integrated elements 2019-04-08 83 Complete 49
Sequence Read Archive - Legionella pneumophila Integrated elements 2019-04-08 9 Raw reads 22
DataONE Dash Plasmids Exogenous elements 2019-04-05 10,892 Complete 52
NCBI - Microviridae Exogenous elements 2019-12-02 2,864 Complete & draft 49
NCBI - Viruses Exogenous elements 2019-04-08 28,652 Complete 49
IMG/VR Exogenous elements 2020-01-29 734,969 Metagenomic 55, 56
Microviridae-associated contigs from IMG/M Exogenous elements 2020-02-04 14,724 Metagenomic 57
Microviridae-associated contigs from marine sources Exogenous elements 2020-07-30 12,775 Metagenomic 53, 54
Oxford Nanopore read from an aquifer Exogenous elements 2020-06-25 2,380,279 Metagenomic 58

Even within the expanded spacer catalog, LME-1 remains the only integrated target of L. pneumophila CRISPR-Cas. Thirty-two spacers target the element, representing approximately 2% of the unique spacer library (Table S6). LME-1-targeting spacers are present in 11 ancestrally distinct CRISPR-Cas array groups (∼32% of all L. pneumophila arrays have a spacer that targets LME-1), covering all three CRISPR-Cas subtypes (Table 2, Fig. 2, see Table S5 for all spacers that have been ascribed a target). Within individual arrays, we observe multiple instances of adjacent LME-1-targeting spacers, consistent with repeat exposure and primed spacer acquisition (31) (Fig. 2). Of the 32 spacers that target LME-1, the vast majority (28 spacers) target all variants.

TABLE 2.

CRISPR spacer distribution: breakdown of the number of unique spacers for each CRISPR-Cas subtype, the number of spacers with identified targets, and the number by target class

CRISPR-Cas subtype Type I-C Type I-F Type II-B
No. unique spacers 270 774 545
No. spacers with target 13 52 6
Percentage of library with target 4.81 6.72 1.10
No. spacers targeting L. pneumophila genome: LME-1 6 21 5
No. spacers targeting Microviridae 6 29 0
No. spacers targeting L. pneumophila genome: other 0 1 1
No. spacers targeting Myoviridae 0 1 0
No. spacers targeting Caudovirales 1 0 0

FIG 2.

FIG 2

L. pneumophila CRISPR-Cas systems repeatedly target LME-1 and microvirus phages. A Unix shell script was written to search for targets of L. pneumophila CRISPR-Cas. This pipeline masked putative type I-C, I-F, and II-B CRISPR arrays in input genomes before creating a local BLAST database. It then queried the unique spacer library against this database using BLAST and extracted putative target sequences for downstream analyses. The arrays were visualized with CRISPRStudio; white boxes represent spacers without ascribed targets, while colored boxes represent spacers with ascribed targets. To reduce redundancy, for ancestrally related arrays (arrays with 2 or more shared spacers at the 3′ end), the longest array is shown. Shorter arrays (with overlapping spacer content already represented in the figure) are not shown unless they contain one or more unique spacers with an identifiable target. These ancestrally related arrays with divergent but ascribed 5′ spacer content are grouped together, with spacer loss shown as dashes and shared spacers between arrays indicated by connecting vertical lines. See Table S5 for a complete list of all L. pneumophila spacers with ascribed targets.

While two additional spacers mapped back to other L. pneumophila genomic targets (Table 2, Fig. 2, Table S6), neither fall within the ICEs identified by PhiSpy or VirSorter above. Close inspection of the genomic regions surrounding these singleton hits does not suggest the presence of a mobile element or prophage based on the presence of phage-like proteins or %GC content. Notably absent from the collection of spacer hits are sequences matching the 5 endogenous plasmids (Paris [48], Lorraine [48], Lens [24, 48], Mississauga-2006 [22], and OLDA [51]) previously described in L. pneumophila. Likewise, we observe no matches between spacers and 22 additional endogenous L. pneumophila plasmids deposited into NCBI (Table S1) or even a larger, general database of plasmid sequences (52) (Table 1). Taken together, LME-1 remains the only identified integrative element (prophage-like or otherwise) targeted by L. pneumophila CRISPR-Cas.

L. pneumophila CRISPR-Cas targets phages.

The observation that LME-1 is the only identifiable prophage-like element targeted by CRISPR-Cas argues against widespread lysogeny in L. pneumophila and is consistent with prior observations that the Legionella genus is remarkably prophage-poor (23). The question remains, however, as to whether one or more lytic phages might pose an exogenous threat to L. pneumophila survival. To look for evidence of past L. pneumophila-phage encounters, we used the unique spacer library to query an extensive database of viral and phage sequences with over 28,000 viral genomes from NCBI (Table 1) (49). Within this collection, only one class of targets presented a signature similar to that of LME-1 (both in the number of unique spacers and the diversity of systems involved). Phages from the family Microviridae are, like LME-1, recurrent targets of L. pneumophila CRISPR-Cas—targeted by 28 spacers across 9 ancestrally distinct array groups (∼28% of all L. pneumophila arrays have a spacer that targets one or more phages from the family Microviridae) (Table 2, Fig. 2, Tables S5 and S6).

Many of the microvirus- and LME-1-targeting spacers are found near the 5′ end of the array, suggesting a recent acquisition event as this is where new spacers are generally acquired (31) (Fig. 2). We have previously shown that the type I-C (Toronto-2005) (22) and the type I-F (Lens chromosome, Lens plasmid, Mississauga-2006) (22, 27) CRISPR-Cas systems of L. pneumophila are active and adaptive and as such are still capable of acquiring new spacers. Some L. pneumophila isolates share an identical CRISPR array at the 3′ end, but possess unique spacers at the 5′ end of the array. This indicates recent spacer acquisition after divergence (Fig. 2; compare FJEJ01 with FJAB01 and Lens-plasmid with FJBW01) and some of these unique spacers target microviruses or LME-1. Moreover, as with LME-1, several instances of adjacent microvirus-targeting spacers can be found within individual arrays—again suggestive of multiple exposures and consistent with primed spacer acquisition (31) (Fig. 2). Collectively, this suggests that microviruses, like LME-1, are contemporary and recurrent targets of L. pneumophila CRISPR-Cas.

Since many of the targeted microvirus genomes were assembled from metagenomic reads and subsequently deposited into NCBI, we reasoned there could be additional targets of L. pneumophila CRISPR-Cas present in metagenomic data. As such, we next searched for spacer hits across several metagenomic data sets (Table 1) (5358). These analyses identified 8 additional spacers that targeted metagenomic contigs. BLASTn analysis of these contigs determined that 7 of the additional spacers targeted contigs of likely microvirus origin and 1 targeted a contig of likely Caudovirales origin (Table 2, Fig. 2, Table S7) (44).

In total, 35 L. pneumophila spacers match sequences of microvirus origin: 25 spacers target the conserved major capsid protein VP1, 4 target the pilot protein VP2, and 5 target the DNA replication protein VP4. One spacer targets a noncoding region (Table 3). As with the spacers against LME-1, microvirus-targeting spacers come from a genetically and geographically diverse set of L. pneumophila isolates. This suggests that one or more microviruses may represent a commonly encountered threat to L. pneumophila in the environment.

TABLE 3.

Microviridae target regions: names, sequences, and predicted target regions for the unique spacers targeting Microviridae

Spacer ID Spacer sequence (5′ to 3′) Predicted target region
Alcoy-spacer32-IF TGCTTTGAGTCATATTGGAGAGCAAGCTGTTC VP1 (major capsid protein)
Calgary2017-spacer6-IF TTTGAACCGTATGTTTTCTCGTCAGACACGTT VP1 (major capsid protein)
FJAB01-spacer12-IF TAGTATTGTAACTGGTCAGTTTAGGTCTAATT VP1 (major capsid protein)
FJAB01-spacer53-IF AAGATGAGACAACGCAGGCCAATAAAAATCAA VP1 (major capsid protein)
FJDB01-spacer29-IF TTATCTGCCGCGCTTCGCTTGCCTGCTACTTT Putative noncoding region
FJDB01-spacer38-IF TAAAGTAGCTGCCAGCATAGGATTAATACCCG VP2 (DNA pilot protein)
FJDB01-spacer39-IF TGATGATTCTACGGATTATTTAGTTCCCACGA VP1 (major capsid protein)
FJDB01-spacer69-IF AATTAATGTTAATCCTGTACCTCAGACTAGTT VP1 (major capsid protein)
FJDB01-spacer70-IF AATTATCCCAAACAAGACGATACGGAACAGCA VP1 (major capsid protein)
FJEJ01-spacer1-IF TAGATAATTGTTTTGCACCTGTTGCAAGTCTA VP1 (major capsid protein)
JFII01-spacer5-IC TGAATTTGAAACGCCTCTCGCAACTGATTAATAG VP1 (major capsid protein)
JFII01-spacer14-IC CTGATGTTTATGCTGATTTGTCAAATGCTACTGCTG VP1 (major capsid protein)
JFII01-spacer19-IC ACACCATACTCTCCACAATGATAATAACGAATAC VP4 (replication initiation protein)
JFII01-spacer30-IC ATAATATAACGAGCAACATAAGCAGCAGAATTAAA VP4 (replication initiation protein)
JFII01-spacer31-IC AAATTAAAAGTATCACCAGGTAAAGCTTCATCAA VP1 (major capsid protein)
JFII01-spacer32-IC TCCTGAATAAACGATGAATTAAGAACAGGCAAAG VP1 (major capsid protein)
JFIM01-spacer10-IF TACCATTAATAACAACAAAATCACCATTATAG VP4 (replication initiation protein)
JFIM01-spacer13-IF TGATAATCCTGCTGATTATGTCCTTTTGAGGC VP1 (major capsid protein)
JFIM01-spacer15-IF ATCATACGTTGATATGATGTATTAGACATACG VP2 (DNA pilot protein)
JFIM01-spacer16-IF ACAAGCAAATTTGGCTGCTTTTGGTACTGCTA VP1 (major capsid protein)
JFIM01-spacer18-IF CAAACAAGTCGATAAGGAACAGCAAAAAAGAA VP1 (major capsid protein)
JFIM01-spacer19-IF TCAAGATAAACAGAATCACCTTTCTGGTAAGA VP1 (major capsid protein)
LBAK01-spacer49-IF AGCAGCCTTCATATCAGCCACCATACGCTGAT VP2 (DNA pilot protein)
LBAK01-spacer52-IF TCTTGGCACTACTGCTCCAGTTATTCGTGATG VP1 (major capsid protein)
LBAK01-spacer53-IF AACGAGTTTGACGACTAAACATCCTATTCAAA VP1 (major capsid protein)
LBAN01-spacer4-IC ATTCTCCTTCTTTTGCTTTTTCTACTAATAGCGA VP1 (major capsid protein)
Lens-plasmid-spacer5-IF AAAATAATCCGACAAACTACCATTCGCATAAC VP1 (major capsid protein)
Lens-plasmid-spacer6-IF TTGGCCTCAGAAAGGCCCTGCTGTTGAGTTAC VP1 (major capsid protein)
Lens-plasmid-spacer28-IF ACTGTGGTGAGTATGGTGAGAAGTACGGAAGA VP4 (replication initiation protein)
Mississauga2006-spacer1-IF AAATCAAATTGTACGCACGATGAAACAAACTA VP1 (major capsid protein)
Mississauga2006-spacer2-IF ACGACGAAGAAGAACATAATCAGCAGGATTGT VP1 (major capsid protein)
Mississauga2006-spacer34-IF TCAAAAAGGTCCTGCTGTTGAGTTGCCTTTGG VP1 (major capsid protein)
Mississauga2006-spacer43-IF TTGTGGTCAGTGTATTGGTTGTAAGTTACGCA VP4 (replication initiation protein)
Mississauga2006-spacer44-IF TCCTATTGATCGTATTAAAGCTCTTGGTGATG VP1 (major capsid protein)
Mississauga2006-spacer69-IF TTCTCAAGTTGGTGCTCCTATGCAAACTGGTG VP2 (DNA pilot protein)

Taken together, approximately 4% of our spacer library has ascribed targets, with ∼94% of these targeting LME-1 and microviruses (Table 2). This is comparable to a comprehensive analysis of spacer sequences in over 48,000 bacterial and archaeal genomes from 8 different phyla by Shmakov and colleagues, in which they sought to determine spacer matches for over 42,000 CRISPR-Cas arrays and over 360,000 unique spacers (39). Approximately 7% of the spacers from these systems were matches to predominantly viral sequences (39). They suggest the remaining ∼93% “dark matter” spacers did not have identifiable spacer matches for two possible reasons: (i) the spacer match has not yet been sequenced or (ii) mutational escape in the protospacer sequence reduces the likelihood that a spacer will be bioinformatically ascribed to it (39). The lack of any other multiply targeted class of phage in our data does not rule out the possibility that nonmicrovirus phages could encounter L. pneumophila. However, our data clearly indicate that, in addition to LME-1, Legionella has recurrent encounters with one or more members of the Microviridae, which we now consider to be our best lead in the hunt for L. pneumophila phages.

Microviruses targeted by L. pneumophila likely belong to the subfamily Gokushovirinae.

The family Microviridae consists of the recognized subfamilies Bullavirinae, which includes well-characterized members such as phi-X174, alpha3, and G4 (59, 60), and Gokushovirinae, first characterized as phages of Chlamydia (1518), Spiroplasma (6164), and Bdellovibrio (65). Several additional subfamilies have been proposed in recent years, including Pichovirinae (66), Alpavirinae (67), Sukshmavirinae (68), dragonfly-associated microvirus “Group D” (68, 69), a Parabacteroidetes group based on Parabacteroidetes prophages (70, 71), and Pequeñovirus (70, 72).

To determine if we could further narrow the targets of L. pneumophila CRISPR-Cas to one or more of these subfamilies, we first generated a major capsid (VP1) phylogeny with all 3,000 available microvirus genomes (Fig. 3, Table S8). We also included 14 spacer-targeted metagenomic contigs (identified above) with overlaps of 99 to 135 bp on the ends, as this overlap suggests these to be complete, circularizable microvirus genomes (Table S7). We then labeled the top phage genome match for each spacer on the tree. Strikingly, nearly all the identified spacer targets belong to the subfamily Gokushovirinae (Fig. 3). First identified as phages for intracellular pathogens such as Chlamydia and Spiroplasma, gokushoviruses are ubiquitous and abundant in diverse environments, including marine, freshwater, soil, fecal, and animal tissue environments (6881), and infect hosts ranging from marine bacteria (54) to enterobacteria (82). The only additional hit is to genomes clustering with the proposed subfamily Pichovirinae, a single spacer notable mainly because it targets a noncoding region which one would not expect to be well conserved (66) (Fig. S3).

FIG 3.

FIG 3

A phylogeny of microvirus capsids reveals the L. pneumophila spacer targets belong to the subfamily Gokushovirinae. Amino acid sequences of major capsid proteins from 3,014 microviruses were aligned using MUSCLE. A phylogeny was generated from this alignment using FastTree and rooted to the Bullavirinae clade. Subfamily designations based on the literature are denoted in the inner colored ring as indicated. Gaps in this ring represent unclassified microviruses. The top hits based on the fewest number of mismatches to L. pneumophila spacers are shown in the outer ring, with their target gene designated. VP1 is the major capsid protein, VP2 is the DNA pilot protein, and VP4 is the replication initiation protein. Darker spacer bars represent spacers that meet the stringency criteria of 5 or fewer mismatches with its protospacer, the presence of a canonical PAM, and no mismatches in the seed sequence of the spacer.

Extension of spacer sequence similarity into nonconserved regions of gokushoviruses.

We next asked whether one or more of the gokushovirus-targeting spacers mapped back to regions of low conservation between phages. In particular, spacer sequences that mapped back to predicted host-determinant regions of the capsid (64, 66) might be highly informative, suggestive of a close match. In contrast, highly conserved sequences are present in many related phage genomes, reducing the discriminatory power of an individual spacer to identify the specific phage whose encounter led to its acquisition. Not surprisingly, the vast majority of gokushovirus-targeting spacers map back to regions of high conservation, likely reflective of the overrepresentation of conserved sequences within any database (Fig. 4A). We then relaxed the stringency of our mapping criteria to examine potential spacers that could map to the variable regions of the capsid. If the exact spacer match to this region has not yet been sequenced, we would expect there to be more mismatches since this region has lower sequence conservation than the rest of the capsid. Two spacers target the DNA encoding the flanking sequence of the predicted host-determinant variable region and extend into it (Fig. 4B) and one spacer maps to the DNA encoding the internal variable sequence (Fig. 4C). Given the contribution of the capsid variable region to gokushovirus host range, these spacers likely provide the first glimpse into specific phage sequences required to infect L. pneumophila.

FIG 4.

FIG 4

Spacers that meet the stringent criteria largely map to conserved regions of their target genes. (A) The VP1, VP2, and VP4 gene sequences were aligned for the targeted gokushovirus genomes that meet the stringency criteria (5 or fewer mismatches with its protospacer, a canonical PAM for its CRISPR subtype, no mismatches in the seed sequence region [positions 1 to 5, 7, 8]), and are the top hit for a spacer). Nucleotide sequence conservation is shown as a heatmap within the arrow, while spacers mapped onto the alignment are shown as purple boxes. The two boxes marked with a red asterisk contain an insertion in the protospacer region that is present in 1/15 genomes used in the alignment. The host determinant region of the major capsid protein is denoted with a gray box. The gray scale bar indicates nucleotide positions within each open reading frame. (B) Sequence alignments of two nonstringent spacers that border the host determinant region with their respective top hit phage genome. The spacer name is shown in purple, mismatches between the spacer and the protospacer are nonbolded characters, and the gray box shows the portion of the spacer that is in the host determinant region. (C) Sequence alignment of a nonstringent spacer that is contained within the host determinant region. Labeling is as shown for panel B.

Using L. pneumophila CRISPR-Cas systems, we have developed an updated, sequence-based composite sketch of L. pneumophila phage. Our findings suggest that isolation of phages from Legionella-containing water sources should be performed under conditions likely to capture gokushoviruses (53, 54, 83), using a diverse collection of CRISPR-mutant strains. The size of gokushoviruses (4 to 5 kb) also raises the intriguing possibility that synthetic biology could be used to develop one or more Legionella-infecting virions (84), starting with the sequences we have identified to date.

In summary, the ∼30-kb episomal element LME-1 that we previously demonstrated to be a recurrent target of an evolutionarily diverse set of L. pneumophila CRISPR-Cas arrays (22) remains the only known integrated element targeted by L. pneumophila’s CRISPR-Cas defenses. Intriguingly, we also found that L. pneumophila CRISPR-Cas defenses are directed against sequences with strong similarity to phages of the Gokushovirinae. While the specific phages that target L. pneumophila remain uncharacterized, gokushoviruses meet the criteria expected for a heretofore undiscovered L. pneumophila phage: they are largely lytic, consistent with the dearth of prophage sequences across Legionella, and they are known to infect other intracellular bacteria, such as Chlamydia (1518). The suitability of phage therapy for intracellular pathogens remains an open question. Identification of L. pneumophila phages would be of practical importance to study how phages access bacteria with an intracellular lifestyle. Compared to Chlamydia, L. pneumophila is much more genetically tractable, can be cultured outside host cells in the laboratory, and grows inside a staggering diversity of eukaryotic hosts (4). As such, L. pneumophila would represent a powerful model system by which to understand the feasibility and constraints of using phages to target intracellular pathogens for phage therapy or, in the case of Legionella, bioremediation of water systems.

MATERIALS AND METHODS

Legionella genomes used in this study.

Legionella pneumophila genomes (draft and completed) were downloaded from the European Nucleotide Archive (https://www.ebi.ac.uk/ena/browser/home) (50) and from NCBI (https://www.ncbi.nlm.nih.gov/) (49). A complete list of accession numbers can be found in Table S1.

Some target sequence data were produced by the U.S. Department of Energy Joint Genome Institute (http://www.jgi.doe.gov/) in collaboration with the user community (Table S7).

Prophage identification.

The downloaded L. pneumophila genomes were annotated with Prokka (v.1.14.6) (85) and the resulting GenBank formatted files were analyzed using PhiSpy (41) with the default settings. The original nucleotide FASTA files were analyzed using VirSorter (42) with the default settings.

CsrT amino acid sequences from mobile elements (43) identified by PhiSpy and VirSorter were extracted and aligned with MUSCLE (86) and a tree was generated using FastTree (v 2.1.11) with the default settings (87). The tree was rooted to group I, Trb (43), and visualized using the Interactive Tree of Life (iTOL) web server (88). Reference sequences were annotated as per Abbott and colleagues (43).

The 29-nucleotide LME-1 attachment site (GTCTGATTATCAAAATAATCAGACTTAAT) (22) was queried by BLASTn against L. pneumophila genomes using the following parameters: -gapopen 10 -gapextend 2 -reward 1 -penalty -1 -evalue 1 -word_size 7 (44).

Core genome identification across LME-1 variants.

The FASTA sequences for unique LME-1 variants were subjected to an OrthoMCL analysis to look at the overlap in gene content. Briefly, the sequences were annotated using Prodigal (89), then were analyzed using OrthoMCL (v.2.0.9) with the default settings (90). The resulting clusters were queried by BLASTp using the NCBI viral database (taxid:10239) (44) and using HHpred (https://toolkit.tuebingen.mpg.de/tools/hhpred) (45) against “UniProt-Swiss-Prot-viral70_23_Aug_2020” using default settings to determine if any had sequence similarity to known phage genes.

To generate a core genome phylogeny, the core genes from each LME-1 variant were aligned using MUSCLE (86) and a tree was generated using FastTree (v 2.1.11) using the default settings and the GTR model (87). The tree was visualized with GGTREE (v. 2.2.4) (91), and the orthologous group presence/absence map was plotted with ggplot2 (v. 3.3.3) (92).

CRISPR-Cas identification and spacer cataloging.

CRISPRCasFinder (46) was used to detect putative CRISPR-Cas systems in L. pneumophila genomes (Table S1) and CRISPRDetect (47) was used to determine the transcriptional direction of each CRISPR array. Spacers were extracted from each array and compiled to form spacer libraries. Redundant spacers were removed from the libraries to create a unique spacer library for downstream analyses. Rarefaction curves for the spacer library of each CRISPR-Cas subtype were generated using Analytic Rarefaction (v. 2.2.1) using the default parameters (available from https://strata.uga.edu/software/index.html). The curves were plotted using ggplot2 (v. 3.3.2) (Fig. S2) (92).

CRISPR-Cas target search.

A pipeline to search for targets of L. pneumophila CRISPR-Cas was written as a Unix shell script. L. pneumophila genomes were first searched for putative type I-C, I-F, and II-B CRISPR arrays using the repeat structure as a guide. If a CRISPR array was detected, it was subsequently masked to minimize false-positive hits to the input genomes. A BLAST database was then constructed containing the Legionella genomes with masked CRISPR arrays to look for integrated genetic elements (see Table 1). The unique spacer library (see above) was queried against the BLAST database (BLASTn parameters: -gapopen 10 -gapextend 2 -reward 1 -penalty -1 -evalue 0.01 -word_size 7) (44). The BLAST search was done twice, once for each strand. The BLAST output was then processed to score each hit. The scoring metric subtracted the length of the alignment between a spacer and a hit from the length of the spacer, then added the number of mismatches between the two sequences. For example, an alignment length of 30, a spacer length of 32, and 0 mismatches would generate a score of 2. This was done to take both alignment length and number of mismatches into account when ordering the hits for downstream analyses (for example, a hit with 0 mismatches and a short alignment length would not be ranked higher than a hit with 2 mismatches but an alignment length across the entire spacer). After scoring, the BLAST output was converted into a BED format. During this step, the putative target sequences were extended to account for alignment length, followed by the addition of a 3 nucleotide extension on either end of the sequence for downstream protospacer adjacent motif (PAM) filtering. SAMtools (93) and BEDTools (94) were used to extract the putative target sequences and their flanking regions from the input genomes. Finally, the pipeline searched for canonical PAMs for the type I-C, I-F, and II-B systems in the flanking region of the putative target sequence. Although our pipeline searched for type I-C, I-F, and II-B PAMs, we considered this secondary information and were intentionally PAM agnostic when assigning spacers to target groups to avoid missing potential targets that have since escaped through PAM mutations or those targeted by noncanonical PAMs. Analysis of the PAM sequences present in the target sequences showed that many target sequences did contain the canonical PAM (Fig. S4).

The pipeline was also used to search for exogenous threats to the bacterium in viral, plasmid, and metagenomic sequences and run as described above (data sets listed in Table 1). The output from the pipeline can be found in Table S6 for complete/draft genomes, and Table S7 for metagenomic data. CRISPR array groups were visualized with CRISPRStudio (95). PAM sequence logos for the hits were plotted using ggseqlogo (v 0.1) (96).

Major capsid protein phylogeny of targeted microvirus genomes.

A phylogeny of microvirus major capsid proteins was generated using the amino acid sequences from 3,014 genomes (Table S8). The amino acid sequences were aligned using MUSCLE (86) and the tree was generated with FastTree (v 2.1.11) using the default settings (87). The resulting tree was rooted to the subfamily Bullavirinae and visualized using the iTOL web server (88). The subfamily designations for each phage based on the literature were plotted as a color-coded ring circling the tree (Table S8).

Sequence conservation and spacer mapping for repeat L. pneumophila CRISPR-Cas targets.

Sequence conservation analysis and spacer mapping were performed on the microvirus genomes that contained a top-hit target for a spacer that met the following stringency criteria: 5 or fewer mismatches with its protospacer, a canonical PAM for its CRISPR subtype, and no mismatches in the seed sequence region (positions 1 to 5, 7, 8). These stringency criteria were relaxed to examine potential spacer matches with the variable region of the major capsid protein. The VP1 (major capsid protein), VP2 (DNA pilot protein), and VP4 (replication initiation protein) genes were extracted from these genomes using Geneious Prime 2021, and the sequences for each gene were aligned using the MUSCLE plug-in (86). The alignment, consensus sequence, sequence identity profiles, and spacer location for each gene were plotted using GViz (v. 1.32.0) (97), binning the sequence conservation with a sliding window of 5.

Code availability.

The script used to search for targets of L. pneumophila CRISPR-Cas systems is available on GitHub at https://github.com/EnsmingerLab/LegionellaCRISPRTargetSearch.

Data availability.

Public sequence data for published and unpublished studies are available from GenBank (NCBI), European Nucleotide Archive (ENA), Joint Genome Institute (JGI), or Dryad. Accession numbers and data source can be found in Tables S1, S7, and S8 in the supplemental material.

ACKNOWLEDGMENTS

We thank Eric Bastien and Melissa Duhaime for early access to their metagenomic data to search for additional L. pneumophila CRISPR-Cas targets. In particular, we want to acknowledge the entire community of researchers who have shared data through JGI, especially Karthik Anantharaman, Vincent Denef, Katherine McMahon, David Walsh, and Erica Young for access to their unpublished data. We thank Félix Croteau and Alexander Hynes for their suggestions regarding Legionella prophage analysis, Jordan Lin for his help with OrthoMCL analyses, and David Faguy for discussions regarding coding, along with members of the Ensminger laboratory for their suggestions and careful reading of the manuscript.

S.R.D. is supported by a fellowship from the Department of Biochemistry, University of Toronto and an Ontario Graduate Scholarship. This work was supported by a Project Grant from the Canadian Institutes of Health Research (PHT-148819).

Footnotes

Supplemental material is available online only.

Supplemental file 1
Figures S1 to S4. Download AEM.00467-21-s0001.pdf, PDF file, 0.2 MB (256KB, pdf)
Supplemental file 2
Tables S1 to S8. Download AEM.00467-21-s0002.xlsx, XLSX file, 0.5 MB (485.3KB, xlsx)

Contributor Information

Malene L. Urbanus, Email: malene.urbanus@utoronto.ca.

Alexander W. Ensminger, Email: alex.ensminger@utoronto.ca.

M. Julia Pettinari, University of Buenos Aires

REFERENCES

  • 1.Rowbotham TJ. 1980. Preliminary report on the pathogenicity of Legionella pneumophila for freshwater and soil amoebae. J Clin Pathol 33:1179–1183. 10.1136/jcp.33.12.1179. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Fliermans C. 1996. Ecology of Legionella: from data to knowledge with a little wisdom. Microb Ecol 32:203–228. 10.1007/BF00185888. [DOI] [PubMed] [Google Scholar]
  • 3.van Heijnsbergen E, Schalk JAC, Euser SM, Brandsema PS, Boer den JW, de Roda Husman AM. 2015. Confirmed and potential sources of Legionella reviewed. Environ Sci Technol 49:4797–4815. 10.1021/acs.est.5b00142. [DOI] [PubMed] [Google Scholar]
  • 4.Boamah DK, Zhou G, Ensminger AW, O'Connor TJ. 2017. From many hosts, one accidental pathogen: the diverse protozoan hosts of Legionella. Front Cell Infect Microbiol 7:477. 10.3389/fcimb.2017.00477. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.McDade JE, Shepard CC, Fraser DW, Tsai TR, Redus MA, Dowdle WR. 1977. Legionnaires' disease: isolation of a bacterium and demonstration of its role in other respiratory disease. N Engl J Med 297:1197–1203. 10.1056/NEJM197712012972202. [DOI] [PubMed] [Google Scholar]
  • 6.Muder RR, Yu VL, Woo AH. 1986. Mode of transmission of Legionella pneumophila. A critical review. Arch Intern Med 146:1607–1612. 10.1001/archinte.1986.00360200183030. [DOI] [PubMed] [Google Scholar]
  • 7.Mondino S, Schmidt S, Rolando M, Escoll P, Gomez-Valero L, Buchrieser C. 2020. Legionnaires' disease: state of the art knowledge of pathogenesis mechanisms of Legionella. Annu Rev Pathol 15:439–466. 10.1146/annurev-pathmechdis-012419-032742. [DOI] [PubMed] [Google Scholar]
  • 8.Gao LY, Harb OS, Kwaik YA. 1997. Utilization of similar mechanisms by Legionella pneumophila to parasitize two evolutionarily distant host cells, mammalian macrophages and protozoa. Infect Immun 65:4738–4746. 10.1128/iai.65.11.4738-4746.1997. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Ensminger AW. 2016. Legionella pneumophila, armed to the hilt: justifying the largest arsenal of effectors in the bacterial world. Curr Opin Microbiol 29:74–80. 10.1016/j.mib.2015.11.002. [DOI] [PubMed] [Google Scholar]
  • 10.Steinert M, Hentschel U, Hacker J. 2002. Legionella pneumophila: an aquatic microbe goes astray. FEMS Microbiol Rev 26:149–162. 10.1111/j.1574-6976.2002.tb00607.x. [DOI] [PubMed] [Google Scholar]
  • 11.Storey MV, Winiecka-Krusnell J, Ashbolt NJ, Stenström T-A. 2004. The efficacy of heat and chlorine treatment against thermotolerant Acanthamoebae and Legionellae. Scand J Infect Dis 36:656–662. 10.1080/00365540410020785. [DOI] [PubMed] [Google Scholar]
  • 12.Cervero-Aragó S, Rodríguez-Martínez S, Puertas-Bennasar A, Araujo RM. 2015. Effect of common drinking eater disinfectants, chlorine and heat, on free Legionella and amoebae-associated Legionella. PLoS One 10:e0134726. 10.1371/journal.pone.0134726. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Kilvington S, Price J. 1990. Survival of Legionella pneumophila within cysts of Acanthamoeba polyphaga following chlorine exposure. J Appl Bacteriol 68:519–525. 10.1111/j.1365-2672.1990.tb02904.x. [DOI] [PubMed] [Google Scholar]
  • 14.Barker J, Brown MR, Collier PJ, Farrell I, Gilbert P. 1992. Relationship between Legionella pneumophila and Acanthamoeba polyphaga: physiological status and susceptibility to chemical inactivation. Appl Environ Microbiol 58:2420–2425. 10.1128/aem.58.8.2420-2425.1992. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Liu BL, Everson JS, Fane B, Giannikopoulou P, Vretou E, Lambden PR, Clarke IN. 2000. Molecular characterization of a bacteriophage (Chp2) from Chlamydia psittaci. J Virol 74:3464–3469. 10.1128/jvi.74.8.3464-3469.2000. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Hsia R, Ohayon H, Gounon P, Dautry-Varsat A, Bavoil PM. 2000. Phage infection of the obligate intracellular bacterium, Chlamydia psittaci strain guinea pig inclusion conjunctivitis. Microbes Infect 2:761–772. 10.1016/s1286-4579(00)90356-3. [DOI] [PubMed] [Google Scholar]
  • 17.Hsia RC, Ting LM, Bavoil PM. 2000. Microvirus of Chlamydia psittaci strain guinea pig inclusion conjunctivitis: isolation and molecular characterization. Microbiology 146:1651–1660. 10.1099/00221287-146-7-1651. [DOI] [PubMed] [Google Scholar]
  • 18.Garner SA, Everson JS, Lambden PR, Fane BA, Clarke IN. 2004. Isolation, molecular characterisation and genome sequence of a bacteriophage (Chp3) from Chlamydophila pecorum. Virus Genes 28:207–214. 10.1023/B:VIRU.0000016860.53035.f3. [DOI] [PubMed] [Google Scholar]
  • 19.Lammertyn E, Vande Voorde J, Meyen E, Maes L, Mast J, Anné J. 2008. Evidence for the presence of Legionella bacteriophages in environmental water samples. Microb Ecol 56:191–197. 10.1007/s00248-007-9325-z. [DOI] [PubMed] [Google Scholar]
  • 20.Grigor'ev AA, Bondarev VP, Borisevich IV, Darmov IV, Mironin AV, Zolotarev AG, Pogorel'skiĭ IP, Ianov DS. 2008. Temperate Legionella bacteriophage: discovery and characteristics (in Russian). Zh Mikrobiol Epidemiol Immunobiol 2008(4):86–88. [PubMed] [Google Scholar]
  • 21.Luneberg E, Mayer B, Daryab N, Kooistra O, Zahringer U, Rohde M, Swanson J, Frosch M. 2004. Chromosomal insertion and excision of a 30 kb unstable genetic element is responsible for phase variation of lipopolysaccharide and other virulence determinants in Legionella pneumophila. Mol Microbiol 39:1259–1271. 10.1111/j.1365-2958.2001.02314.x. [DOI] [PubMed] [Google Scholar]
  • 22.Rao C, Guyard C, Pelaz C, Wasserscheid J, Bondy-Denomy J, Dewar K, Ensminger AW. 2016. Active and adaptive Legionella CRISPR-Cas reveals a recurrent challenge to the pathogen. Cell Microbiol 18:1319–1338. 10.1111/cmi.12586. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Gomez-Valero L, Rusniok C, Rolando M, Neou M, Dervins-Ravault D, Demirtas J, Rouy Z, Moore RJ, Chen H, Petty NK, Jarraud S, Etienne J, Steinert M, Heuner K, Gribaldo S, Digue CM, Glöckner G, Hartland EL, Buchrieser C. 2014. Comparative analyses of Legionella species identifies genetic features of strains causing Legionnaires disease. Genome Biol 15:505. 10.1186/s13059-014-0505-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.D'Auria G, Jimenez-Hernandez N, Peris-Bondia F, Moya A, Latorre A. 2010. Legionella pneumophila pangenome reveals strain-specific virulence factors. BMC Genomics 11:181. 10.1186/1471-2164-11-181. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Ginevra C, Jacotin N, Diancourt L, Guigon G, Arquilliere R, Meugnier H, Descours G, Vandenesch F, Etienne J, Lina G, Caro V, Jarraud S. 2012. Legionella pneumophila sequence type 1/Paris pulsotype subtyping by spoligotyping. J Clin Microbiol 50:696–701. 10.1128/JCM.06180-11. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Rao C, Chin D, Ensminger AW. 2017. Priming in a permissive type I-C CRISPR-Cas system reveals distinct dynamics of spacer acquisition and loss. RNA 23:1525–1538. 10.1261/rna.062083.117. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Deecker SR, Ensminger AW. 2020. Type I-F CRISPR-Cas distribution and array dynamics in Legionella pneumophila. G3 (Bethesda) 10:1039–1050. 10.1534/g3.119.400813. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Mojica FJM, Díez-Villaseñor C, Soria E, Juez G. 2000. Biological significance of a family of regularly spaced repeats in the genomes of archaea, bacteria and mitochondria. Mol Microbiol 36:244–246. 10.1046/j.1365-2958.2000.01838.x. [DOI] [PubMed] [Google Scholar]
  • 29.Jansen R, van Embden JD, Gaastra W, Schouls LM. 2002. Identification of genes that are associated with DNA repeats in prokaryotes. Mol Microbiol 43:1565–1575. 10.1046/j.1365-2958.2002.02839.x. [DOI] [PubMed] [Google Scholar]
  • 30.Barrangou R, Fremaux C, Deveau H, Richards M, Boyaval P, Moineau S, Romero DA, Horvath P. 2007. CRISPR provides acquired resistance against viruses in prokaryotes. Science 315:1709–1712. 10.1126/science.1138140. [DOI] [PubMed] [Google Scholar]
  • 31.McGinn J, Marraffini LA. 2019. Molecular mechanisms of CRISPR-Cas spacer acquisition. Nat Rev Microbiol 17:7–12. 10.1038/s41579-018-0071-7. [DOI] [PubMed] [Google Scholar]
  • 32.Brouns SJJ, Jore MM, Lundgren M, Westra ER, Slijkhuis RJH, Snijders APL, Dickman MJ, Makarova KS, Koonin EV, van der Oost J. 2008. Small CRISPR RNAs guide antiviral defense in prokaryotes. Science 321:960–964. 10.1126/science.1159689. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Jore MM, Lundgren M, van Duijn E, Bultema JB, Westra ER, Waghmare SP, Wiedenheft B, Pul Ü, Wurm R, Wagner R, Beijer MR, Barendregt A, Zhou K, Snijders APL, Dickman MJ, Doudna JA, Boekema EJ, Heck AJR, van der Oost J, Brouns SJJ. 2011. Structural basis for CRISPR RNA-guided DNA recognition by Cascade. Nat Struct Mol Biol 18:529–536. 10.1038/nsmb.2019. [DOI] [PubMed] [Google Scholar]
  • 34.Wiedenheft B, van Duijn E, Bultema JB, Bultema J, Waghmare SP, Waghmare S, Zhou K, Barendregt A, Westphal W, Heck AJR, Heck A, Boekema EJ, Boekema E, Dickman MJ, Dickman M, Doudna JA. 2011. RNA-guided complex from a bacterial immune system enhances target recognition through seed sequence interactions. Proc Natl Acad Sci U S A 108:10092–10097. 10.1073/pnas.1102716108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Westra ER, van Erp PBG, Künne T, Wong SP, Staals RHJ, Seegers CLC, Bollen S, Jore MM, Semenova E, Severinov KV, de Vos WM, Dame RT, de Vries R, Brouns SJJ, van der Oost J. 2012. CRISPR immunity relies on the consecutive binding and degradation of negatively supercoiled invader DNA by Cascade and Cas3. Mol Cell 46:595–605. 10.1016/j.molcel.2012.03.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Soto-Perez P, Bisanz JE, Berry JD, Lam KN, Bondy-Denomy J, Turnbaugh PJ. 2019. CRISPR-Cas system of a prevalent human gut bacterium reveals hyper-targeting against phages in a human virome catalog. Cell Host Microbe 26:325–335.e5. 10.1016/j.chom.2019.08.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Stern A, Mick E, Tirosh I, Sagy O, Sorek R. 2012. CRISPR targeting reveals a reservoir of common phages associated with the human gut microbiome. Genome Res 22:1985–1994. 10.1101/gr.138297.112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Zhang Q, Rho M, Tang H, Doak TG, Ye Y. 2013. CRISPR-Cas systems target a diverse collection of invasive mobile genetic elements in human microbiomes. Genome Biol 14:R40. 10.1186/gb-2013-14-4-r40. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Shmakov SA, Sitnik V, Makarova KS, Wolf YI, Severinov KV, Koonin EV. 2017. The CRISPR spacer space is dominated by sequences from species-specific mobilomes. mBio 8:e01397–17. 10.1128/mBio.01397-17. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Shmakov SA, Wolf YI, Savitskaya EE, Severinov KV, Koonin EV. 2020. Mapping CRISPR spaceromes reveals vast host-specific viromes of prokaryotes. Commun Biol 3:321. 10.1038/s42003-020-1014-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Akhter S, Aziz RK, Edwards RA. 2012. PhiSpy: a novel algorithm for finding prophages in bacterial genomes that combines similarity- and composition-based strategies. Nucleic Acids Research 40:e126. 10.1093/nar/gks406. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Roux S, Enault F, Hurwitz BL, Sullivan MB. 2015. VirSorter: mining viral signal from microbial genomic data. PeerJ 3:e985. 10.7717/peerj.985. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Abbott ZD, Flynn KJ, Byrne BG, Mukherjee S, Kearns DB, Swanson MS. 2016. csrT represents a new class of csrA-like regulatory genes associated with integrative conjugative elements of Legionella pneumophila. J Bacteriol 198:553–564. 10.1128/JB.00732-15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL. 2009. BLAST+: architecture and applications. BMC Bioinformatics 10:421. 10.1186/1471-2105-10-421. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Zimmermann L, Stephens A, Nam S-Z, Rau D, Kübler J, Lozajic M, Gabler F, Söding J, Lupas AN, Alva V. 2018. A completely reimplemented MPI bioinformatics toolkit with a new HHpred server at its core. J Mol Biol 430:2237–2243. 10.1016/j.jmb.2017.12.007. [DOI] [PubMed] [Google Scholar]
  • 46.Couvin D, Bernheim A, Toffano-Nioche C, Touchon M, Michalik J, Néron B, Rocha EPC, Vergnaud G, Gautheret D, Pourcel C. 2018. CRISPRCasFinder, an update of CRISRFinder, includes a portable version, enhanced performance and integrates search for Cas proteins. Nucleic Acids Res 46:W246–W251. 10.1093/nar/gky425. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Biswas A, Staals RHJ, Morales SE, Fineran PC, Brown CM. 2016. CRISPRDetect: a flexible algorithm to define CRISPR arrays. BMC Genomics 17:356. 10.1186/s12864-016-2627-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Gomez-Valero L, Rusniok C, Jarraud S, Vacherie B, Rouy Z, Barbe V, Medigue C, Etienne J, Buchrieser C. 2011. Extensive recombination events and horizontal gene transfer shaped the Legionella pneumophila genomes. BMC Genomics 12:536. 10.1186/1471-2164-12-536. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Sayers EW, Cavanaugh M, Clark K, Ostell J, Pruitt KD, Karsch-Mizrachi I. 2019. GenBank. Nucleic Acids Res 47:D94–D99. 10.1093/nar/gky989. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Amid C, Alako BTF, Balavenkataraman Kadhirvelu V, Burdett T, Burgin J, Fan J, Harrison PW, Holt S, Hussein A, Ivanov E, Jayathilaka S, Kay S, Keane T, Leinonen R, Liu X, Martinez-Villacorta J, Milano A, Pakseresht A, Rahman N, Rajan J, Reddy K, Richards E, Smirnov D, Sokolov A, Vijayaraja S, Cochrane G. 2020. The European Nucleotide Archive in 2019. Nucleic Acids Res 48:D70–D76. 10.1093/nar/gkz1063. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Mercante JW, Morrison SS, Raphael BH, Winchell JM. 2016. Complete genome sequences of the historical Legionella pneumophila strains OLDA and Pontiac. Genome Announc 4:e00866-16. 10.1128/genomeA.00866-16. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Brooks L, Kaze M, Sistrom M. 2019. A curated, comprehensive database of plasmid sequences. Microbiol Resour Announc 8:e01325-18. 10.1128/MRA.01325-18. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Labonté JM, Suttle CA. 2013. Metagenomic and whole-genome analysis reveals new lineages of gokushoviruses and biogeographic separation in the sea. Front Microbiol 4:404. 10.3389/fmicb.2013.00404. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Labonté JM, Hallam SJ, Suttle CA. 2015. Previously unknown evolutionary groups dominate the ssDNA gokushoviruses in oxic and anoxic waters of a coastal marine environment. Front Microbiol 6:315. 10.3389/fmicb.2015.00315. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Paez-Espino D, Eloe-Fadrosh EA, Pavlopoulos GA, Thomas AD, Huntemann M, Mikhailova N, Rubin E, Ivanova NN, Kyrpides NC. 2016. Uncovering Earth's virome. Nature 536:425–430. 10.1038/nature19094. [DOI] [PubMed] [Google Scholar]
  • 56.Paez-Espino D, Pavlopoulos GA, Ivanova NN, Kyrpides NC. 2017. Nontargeted virus sequence discovery pipeline and virus clustering for metagenomic data. Nat Protoc 12:1673–1682. 10.1038/nprot.2017.063. [DOI] [PubMed] [Google Scholar]
  • 57.Chen I-MA, Chu K, Palaniappan K, Pillay M, Ratner A, Huang J, Huntemann M, Varghese N, White JR, Seshadri R, Smirnova T, Kirton E, Jungbluth SP, Woyke T, Eloe-Fadrosh EA, Ivanova NN, Kyrpides NC. 2019. IMG/M v.5.0: an integrated data management and comparative analysis system for microbial genomes and microbiomes. Nucleic Acids Res 47:D666–D677. 10.1093/nar/gky901. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Overholt WA, Hölzer M, Geesink P, Diezel C, Marz M, Küsel K. 2020. Inclusion of Oxford Nanopore long reads improves all microbial and viral metagenome—assembled genomes from a complex aquifer system. Environ Microbiol 22:4000–4013. 10.1111/1462-2920.15186. [DOI] [PubMed] [Google Scholar]
  • 59.Rokyta DR, Burch CL, Caudle SB, Wichman HA. 2006. Horizontal gene transfer and the evolution of microvirid coliphage genomes. J Bacteriol 188:1134–1142. 10.1128/JB.188.3.1134-1142.2006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Breitbart M, Fane BA. 2021. Microviridae. Encylopedia of Life Sciences 2:1–14. 10.1002/9780470015902.a0029280. [DOI] [Google Scholar]
  • 61.Pascarel-Devilder MC, Renaudin J, Bove JM. 1986. The spiroplasma virus 4 replicative form cloned in Escherichia coli transfects spiroplasmas. Virology 151:390–393. 10.1016/0042-6822(86)90060-7. [DOI] [PubMed] [Google Scholar]
  • 62.Renaudin J, Pascarel MC, Bove JM. 1987. Spiroplasma virus 4: nucleotide sequence of the viral DNA, regulatory signals, and proposed genome organization. J Bacteriol 169:4950–4961. 10.1128/jb.169.11.4950-4961.1987. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Stamburski C, Renaudin J, Bove JM. 1990. Characterization of a promoter and a transcription terminator of Spiroplasma melliferum virus SpV4. J Bacteriol 172:5586–5592. 10.1128/jb.172.10.5586-5592.1990. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Chipman PR, Agbandje-McKenna M, Renaudin J, Baker TS, McKenna R. 1998. Structural analysis of the Spiroplasma virus, SpV4: implications for evolutionary variation to obtain host diversity among the Microviridae. Structure 6:135–145. 10.1016/s0969-2126(98)00016-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Brentlinger KL, Hafenstein S, Novak CR, Fane BA, Borgon R, McKenna R, Agbandje-McKenna M. 2002. Microviridae, a family divided: isolation, characterization, and genome sequence of MH2K, a bacteriophage of the obligate intracellular parasitic bacterium Bdellovibrio bacteriovorus. J Bacteriol 184:1089–1094. 10.1128/jb.184.4.1089-1094.2002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Roux S, Krupovic M, Poulet A, Debroas D, Enault F. 2012. Evolution and diversity of the Microviridae viral family through a collection of 81 new complete genomes assembled from virome reads. PLoS One 7:e40418. 10.1371/journal.pone.0040418. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Krupovic M, Forterre P. 2011. Microviridae goes temperate: microvirus-related proviruses reside in the genomes of Bacteroidetes. PLoS One 6:e19893. 10.1371/journal.pone.0019893. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Quaiser A, Dufresne A, Ballaud F, Roux S, Zivanovic Y, Colombet J, Sime-Ngando T, Francez A-J. 2015. Diversity and comparative genomics of Microviridae in Sphagnum-dominated peatlands. Front Microbiol 6:375. 10.3389/fmicb.2015.00375. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Rosario K, Dayaram A, Marinov M, Ware J, Kraberger S, Stainton D, Breitbart M, Varsani A. 2012. Diverse circular ssDNA viruses discovered in dragonflies (Odonata: Epiprocta). J Gen Virol 93:2668–2681. 10.1099/vir.0.045948-0. [DOI] [PubMed] [Google Scholar]
  • 70.Creasy A, Rosario K, Leigh B, Dishaw L, Breitbart M. 2018. Unprecedented diversity of ssDNA phages from the family Microviridae detected within the gut of a protochordate model organism (Ciona robusta). Viruses 10:404. 10.3390/v10080404. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Orton JP, Morales M, Fontenele RS, Schmidlin K, Kraberger S, Leavitt DJ, Webster TH, Wilson MA, Kusumi K, Dolby GA, Varsani A. 2020. Virus discovery in desert tortoise fecal samples: novel circular single-stranded DNA viruses. Viruses 12:143. 10.3390/v12020143. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Bryson SJ, Thurber AR, Correa AMS, Orphan VJ, Vega Thurber R. 2015. A novel sister clade to the enterobacteria microviruses (family Microviridae) identified in methane seep sediments. Environ Microbiol 17:3708–3721. 10.1111/1462-2920.12758. [DOI] [PubMed] [Google Scholar]
  • 73.Tucker KP, Parsons R, Symonds EM, Breitbart M. 2011. Diversity and distribution of single-stranded DNA phages in the North Atlantic Ocean. ISME J 5:822–830. 10.1038/ismej.2010.188. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Roux S, Enault F, Robin A, Ravet V, Personnic S, Theil S, Colombet J, Sime-Ngando T, Debroas D. 2012. Assessing the diversity and specificity of two freshwater viral communities through metagenomics. PLoS One 7:e33641. 10.1371/journal.pone.0033641. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Hopkins M, Kailasan S, Cohen A, Roux S, Tucker KP, Shevenell A, Agbandje-McKenna M, Breitbart M. 2014. Diversity of environmental single-stranded DNA phages revealed by PCR amplification of the partial major capsid protein. ISME J 8:2093–2103. 10.1038/ismej.2014.43. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Reavy B, Swanson MM, Cock PJA, Dawson L, Freitag TE, Singh BK, Torrance L, Mushegian AR, Taliansky M. 2015. Distinct circular single-stranded DNA viruses exist in different soil types. Appl Environ Microbiol 81:3934–3945. 10.1128/AEM.03878-14. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Zhong X, Guidoni B, Jacas L, Jacquet S. 2015. Structure and diversity of ssDNA Microviridae viruses in two peri-alpine lakes (Annecy and Bourget, France). Res Microbiol 166:644–654. 10.1016/j.resmic.2015.07.003. [DOI] [PubMed] [Google Scholar]
  • 78.Han L-L, Yu D-T, Zhang L-M, Shen J-P, He J-Z. 2017. Genetic and functional diversity of ubiquitous DNA viruses in selected Chinese agricultural soils. Sci Rep 7:45142. 10.1038/srep45142. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Tikhe CV, Husseneder C. 2017. Metavirome sequencing of the termite gut reveals the presence of an unexplored bacteriophage community. Front Microbiol 8:2548. 10.3389/fmicb.2017.02548. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Kraberger S, Waits K, Ivan J, Newkirk E, VandeWoude S, Varsani A. 2018. Identification of circular single-stranded DNA viruses in faecal samples of Canada lynx (Lynx canadensis), moose (Alces alces) and snowshoe hare (Lepus americanus) inhabiting the Colorado San Juan Mountains. Infect Genet Evol 64:1–8. 10.1016/j.meegid.2018.06.001. [DOI] [PubMed] [Google Scholar]
  • 81.Wang H, Ling Y, Shan T, Yang S, Xu H, Deng X, Delwart E, Zhang W. 2019. Gut virome of mammals and birds reveals high genetic diversity of the family Microviridae. Virus Evol 5:vez013. 10.1093/ve/vez013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82.Kirchberger PC, Ochman H. 2020. Resurrection of a global, metagenomically defined gokushovirus. Elife 9:e51599. 10.7554/eLife.51599. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 83.Suttle CA, Chan AM, Cottrell MT. 1991. Use of ultrafiltration to isolate viruses from seawater which are pathogens of marine phytoplankton. Appl Environ Microbiol 57:721–726. 10.1128/aem.57.3.721-726.1991. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 84.Smith HO, Hutchison CA, Pfannkoch C, Venter JC. 2003. Generating a synthetic genome by whole genome assembly: phi X174 bacteriophage from synthetic oligonucleotides. Proc Natl Acad Sci U S A 100:15440–15445. 10.1073/pnas.2237126100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 85.Seemann T. 2014. Prokka: rapid prokaryotic genome annotation. Bioinformatics 30:2068–2069. 10.1093/bioinformatics/btu153. [DOI] [PubMed] [Google Scholar]
  • 86.Edgar RC. 2004. MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 5:113. 10.1186/1471-2105-5-113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 87.Price MN, Dehal PS, Arkin AP. 2010. FastTree 2—approximately maximum-likelihood trees for large alignments. PLoS One 5:e9490. 10.1371/journal.pone.0009490. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 88.Letunic I, Bork P. 2019. Interactive Tree Of Life (iTOL) v4: recent updates and new developments. Nucleic Acids Res 47:W256–W259. 10.1093/nar/gkz239. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 89.Hyatt D, Chen G-L, Locascio PF, Land ML, Larimer FW, Hauser LJ. 2010. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11:119. 10.1186/1471-2105-11-119. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 90.Li L, Stoeckert CJ, Roos DS. 2003. OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res 13:2178–2189. 10.1101/gr.1224503. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 91.Yu G, Smith DK, Zhu H, Guan Y, Lam TTY. 2017. GGTREE: an R package for visualization and annotation of phylogenetic trees with their covariates and other associated data. Methods Ecol Evol 8:28–36. 10.1111/2041-210X.12628. [DOI] [Google Scholar]
  • 92.Wickham H. 2016. ggplot2: elegant graphics for data analysis. Springer-Verlag, New York, NY. ISBN: 978–3–319–24277–4. [Google Scholar]
  • 93.Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, 1000 Genome Project Data Processing Subgroup. 2009. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25:2078–2079. 10.1093/bioinformatics/btp352. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 94.Quinlan AR, Hall IM. 2010. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26:841–842. 10.1093/bioinformatics/btq033. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 95.Dion M, Labrie S, Shah S, Moineau S. 2018. CRISPRStudio: a user-friendly software for rapid CRISPR array visualization. Viruses 10:602. 10.3390/v10110602. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 96.Wagih O. 2017. ggseqlogo: a versatile R package for drawing sequence logos. Bioinformatics 33:3645–3647. 10.1093/bioinformatics/btx469. [DOI] [PubMed] [Google Scholar]
  • 97.Hahne F, Ivanek R. 2016. Visualizing genomic data using Gviz and Bioconductor. Methods Mol Biol 1418:335–351. 10.1007/978-1-4939-3578-9_16. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental file 1

Figures S1 to S4. Download AEM.00467-21-s0001.pdf, PDF file, 0.2 MB (256KB, pdf)

Supplemental file 2

Tables S1 to S8. Download AEM.00467-21-s0002.xlsx, XLSX file, 0.5 MB (485.3KB, xlsx)

Data Availability Statement

Public sequence data for published and unpublished studies are available from GenBank (NCBI), European Nucleotide Archive (ENA), Joint Genome Institute (JGI), or Dryad. Accession numbers and data source can be found in Tables S1, S7, and S8 in the supplemental material.


Articles from Applied and Environmental Microbiology are provided here courtesy of American Society for Microbiology (ASM)

RESOURCES