Abstract
Most bacteria and archaea possess multiple antiviral defence systems that protect against infection by phages, archaeal viruses and mobile genetic elements. Our understanding of the diversity of defence systems has increased greatly in the last few years, and many more systems likely await discovery. To identify defence-related genes, we recently developed the Prokaryotic Antiviral Defence LOCator (PADLOC) bioinformatics tool. To increase the accessibility of PADLOC, we describe here the PADLOC web server (freely available at https://padloc.otago.ac.nz), allowing users to analyse whole genomes, metagenomic contigs, plasmids, phages and archaeal viruses. The web server includes a more than 5-fold increase in defence system types detected (since the first release) and expanded functionality enabling detection of CRISPR arrays and retron ncRNAs. Here, we provide user information such as input options, description of the multiple outputs, limitations and considerations for interpretation of the results, and guidance for subsequent analyses. The PADLOC web server also houses a precomputed database of the defence systems in > 230,000 RefSeq genomes. These data reveal two taxa, Campylobacterota and Spriochaetota, with unusual defence system diversity and abundance. Overall, the PADLOC web server provides a convenient and accessible resource for the detection of antiviral defence systems.
Graphical Abstract
Graphical Abstract.

The PADLOC web server is a one-stop resource for the identification of antiviral defence systems in microbial genomes.
INTRODUCTION
Diverse antiviral defence systems have evolved in bacteria and archaea that defend against infection by their viruses and mobile genetic elements. There are over 60 known broad families of defence systems, with more than 20 distinct system types discovered in the past five years (precise tallies are difficult since classification schemes, such as class, type and subtype, vary between families of systems and the mechanisms of many systems remain unknown) (1–5). As such, system discovery has greatly outpaced the development of tools that make use of these new insights. To provide widespread accessibility to known and newly discovered system types, and ensure consistency between system annotations, we recently developed the Prokaryotic Antiviral Defence Locator (PADLOC) tool as a framework to systematically identify antiviral defence systems (6). To simplify the use of PADLOC, we have developed the PADLOC web server, which expands the functionality of PADLOC, serves as a convenient and accessible interface for using the tool, and provides an extensive database of precomputed results – currently for more than 230,000 bacterial and archaeal genomes.
Details regarding operation and benchmarking of the PADLOC tool itself have been described elsewhere (6). However, there are some important aspects of system detection that users of the PADLOC web server should understand when interpreting results. Genes encoding defence system proteins are identified by searching their protein sequences with a curated database of profile Hidden Markov Models (HMMs), currently representing > 700 families of defence-related proteins. Many of the protein families are represented by multiple HMMs (for example, there are currently 45 HMMs from various sources representing different Cas10 clades). Potential matches are filtered to remove low-scoring hits (based on E-value and coverage thresholds). PADLOC then uses a set of system definition models to determine whether the genetic synteny requirements are met for each possible system classification (Figure 1). This approach to multi-gene system identification has been applied successfully in the past for the detection of CRISPR-Cas (7,8) and protein secretion systems (9). In addition to detecting protein-coding genes, the PADLOC web server includes new functionality to detect CRISPR arrays and ncRNAs, such as retron msr-msd elements. Here, we discuss this added PADLOC functionality, the addition of many new systems to the database, limitations and important considerations for interpretation of the results, and guidance for subsequent analyses.
Figure 1.
Defence system detection with PADLOC. (A) Genes encoding putative defence system proteins are identified using profile HMMs, then compared against the system definition models to determine whether a complete system is present. (B) For example, detection of a DISARM Type I system requires genes encoding all five core components (DrmA, DrmB, DrmC, DrmD, DrmMI) to be present. (C) If the minimum number of genes is not met, the system is not reported. (D) If any genes prohibited for a specific system definition are present (in the case of DISARM type I, drmE is prohibited), the system is rejected, but can instead be reported as a match to a different system definition—in this example as a type II DISARM (requiring genes encoding DrmA, DrmB, DrmC, DrmE and DrmMII).
MATERIALS AND METHODS
Expansion of the PADLOC defence system database
At its core, the PADLOC web server is based on the PADLOC command line tool (https://github.com/padlocbio/padloc), with a curated database of profile HMMs, HMM scoring thresholds, and system models (https://github.com/padlocbio/padloc-db). Our current understanding of defence system genetics is the result of a collective scientific effort, and the HMMs and system models in the PADLOC database were built and curated using data from many sources. Construction of HMMs for the CBASS and Doron systems, plus several variant systems we discovered, are as previously described (6). We have since expanded the PADLOC web server database to contain > 180 system definitions, including > 3,500 profile HMMs. Where profile HMMs were made available by the authors of papers describing new defence system types (10–12), or databases and tools to detect subsets of systems (13–15), these HMMs were assigned PADLOC HMM accessions (e.g. PLDC12345) and added to the PADLOC database (the original HMM names were retained for traceability). Where multiple sequence alignments were available (16,17), we realigned the sequences using MUSCLE (18) and built HMMs with HMMER3 (19). For cases without HMMs or sequence alignments but a list of relevant proteins was available (8,20–39), we either used our sequence clustering and HMM generation pipeline described previously (6), or aligned the sequences with MUSCLE, manually curated the alignments to remove outlier sequences, then built HMMs with HMMER3. In the absence of supplied lists of homologs, we used example sequences from experimentally verified defence systems as seeds for BLAST searches, then aligned, curated and built HMMs, as above (22,40–69).
Where possible, the data source and appropriate reference for each HMM is listed in the PADLOC database HMM metadata file (hmm_meta.txt, available from the PADLOC database repository). We encourage PADLOC users to recognize the importance and value of these data used to build the PADLOC web server by citing the original sources. Models will continue to be added and updated periodically as more defence systems are discovered. We welcome and encourage submissions of new defence systems, including HMMs, multiple sequence alignments, or lists for relevant protein sequences or database accessions. Similarly, feedback to improve the sensitivity and specificity of PADLOC, updates to citation links, and suggestions to improve defence system and protein nomenclature will help ensure PADLOC remains a useful community resource.
Detection of non-coding sequences
Since the command line PADLOC tool detects only protein-coding genes, yet many defence systems contain non-coding RNAs, we integrated detection of CRISPR arrays and retron-associated ncRNAs (msr-msd elements) into the web server. CRISPR arrays are detected with a customized version of CRISPRDetect (70) using the arguments: array_quality_score_cutoff 2.5; minimum_word_repeatation 3; word_length 11; minimum_no_of_repeats 3; repeat_length_cutoff 11; max_gap_between_crisprs 250. The resulting crispr.gff output file is supplied to PADLOC using the --crispr input option and the human-readable crispr.txt output file is made available for user download. Potential ncRNAs associated with retrons are identified by searching a database of msr-msd element covariance models (available from the PADLOC database repository), against each genome sequence using Infernal's cmsearch (71) with the arguments: Z 10; FZ 500. The Infernal output is filtered to only include hits passing the inclusion threshold (E-value = 0.01), then loaded into PADLOC using the --ncrna input option. In specific PADLOC system definition models (e.g. for retrons), ncRNAs are listed as ‘ncRNA’ in the core, accessory or prohibited gene lists, as required. As such, any identified ncRNAs contribute to the total required gene count for each relevant defence system.
Precomputed RefSeq data and pseudogenes
For the precomputed PADLOC dataset (currently based on RefSeq v209 (72)), we used the [assembly]_genomic.fna, [assembly]_genomic.gff and [assembly]_protein.faa files for each genome assembly from the RefSeq FTP server. First, CRISPR arrays and retron-associated ncRNAs were identified by running CRISPRDetect and Infernal, respectively (as above), with [assembly]_genomic.fna as input. The resulting GFF-formatted CRISPRDetect outputs were saved for input to PADLOC and the more detailed, human-readable output files were loaded to the PADLOC web server for user download. Next, we pre-processed the [assembly]_genomic.gff and [assembly]_protein.faa files to allow increased detection of defence systems containing pseudogenes. The NCBI prokaryotic genome annotation pipeline (PGAP) identifies genes that contain frameshifts, nonsense stop codons, or appear otherwise incomplete, as pseudogenes (73). The coordinates of each pseudogene are reported in the [assembly]_genomic.gff, but the corresponding protein sequences are not included in the [assembly]_protein.faa. Since PADLOC relies on any potential defence system protein sequences to be present in the input file, pseudogenes belonging to defence systems would not normally be identified. The PGAP annotates each pseudogene with the accession of the full protein sequence used to infer the product of the pseudogene (e.g. the Bacillus cereus VD146 assembly GCF_000399425.1 has a pseudogene with locus tag IK1_RS32735 that is labelled as similar to the CRISPR-associated protein Cas4 of Oceanobacillus massiliensis WP_010649895.1). Therefore, we substituted the pseudogenes in each RefSeq genome with the sequence of their inferential protein (where available). Lastly, PADLOC was run using the pseudogene-corrected .gff and .faa inputs, plus the CRISPR array and ncRNA inputs (as above).
Implementation
The core PADLOC tool is implemented in R, with some input handling using Bash and Python (primarily Biopython (74)). The PADLOC web server was built using the Django Framework (https://www.djangoproject.com). User jobs are identified by unique and anonymous job identifiers and are not accessible by other users. Users can access their results (tracked using cookies) until their browser cookies are cleared, the results are removed manually by the user, or until they expire (currently after 10 days). On the user side, anonymous job identifiers are replaced with the user-specific job name and output files downloaded are prefixed by the job name. When input files are uploaded to the server, a pre-processing script is used to detect the source format of the files and convert these to the default inputs for PADLOC (.gff and .faa with RefSeq formatting). For example, RAST formatted Genbank files are identified by the presence of the string ‘rasttk’, next the ‘db_xref’ field is changed to ‘locus_tag’, finally Biopython is used to output PADLOC-compatible input files. PADLOC also contains a ‘--fix-prodigal’ option to natively parse Prodigal-formatted .gff and .faa file pairs, which the pre-processing script detects by searching for the string ‘Prodigal’ in the uploaded .gff file. Although the command line version of PADLOC includes a wrapper for gene-calling with Prodigal (allowing input of unannotated nucleotide sequences), the web server runs Prodigal during the pre-processing stage, allowing different gene-calling settings to be used for inputs > 100 kb, versus shorter sequences (see the Prodigal documentation of an explanation of the rationale behind this). For users wanting to analyse unannotated plasmid sequences, we recommend including the host genome sequence within a multi-fasta input file before uploading to the PADLOC web server (to improve the quality of gene predictions with Prodigal). Once PADLOC is run, the output is passed to a post-processing script that reformats and outputs the data in a custom machine-readable format that is used to generate an interactive genome annotation display (produced using d3.js (https://d3js.org/)) on the corresponding user-job result page.
RESULTS
Input and file handling
Users can analyse archaea, bacteria, metagenome, phage, archaeal virus and plasmid genome files from the ‘Run PADLOC’ page. The PADLOC web server accepts GenBank flat files, nucleotide FASTA, or paired amino acid FASTA and general feature format (GFF3) files as input (Figure 2A). If a GenBank file is provided, nucleotide, protein, and feature information (e.g. gene locations) is extracted using Biopython. If a nucleotide FASTA file is provided, Prodigal (75) is used to predict open reading frames and produce a protein FASTA and GFF3 file. For the best quality results, it is recommended that users supply a GenBank file or amino acid FASTA and GFF3 file where coding sequences have already been called and verified (e.g. with Prodigal or Prokka). Users may wish to download the example genome files to see the expected formatting for each file type (Figure 2B). When nucleotide information is provided, either through a GenBank or nucleotide FASTA file, Infernal (71) is used to detect the ncRNA components of retrons. Users also have the option to run CRISPRDetect (70) to predict CRISPR arrays. This additional information helps to verify and enhance the quality of defence system detection. All input files are passed to the core PADLOC module, which then detects the defence systems specified in the PADLOC database (a current list of systems is provided on the web server). The expected processing time for a typical genome encoding ∼5,000 proteins is less than five minutes. Including the CRISPRDetect option can substantially increase the run time in some cases, but usually only adds a couple of minutes.
Figure 2.
The PADLOC web server pipeline and results. (A) The PADLOC web server handles the pre-processing of user input to allow for additional input types and the identification of CRISPR arrays and ncRNAs. (B) Users can upload their own genomes or run an example genome through the ‘Run PADLOC’ page. (C) User jobs are listed on the ‘My Results’ page. Completed jobs link to their individual results pages, failed jobs link to a log file that provides information on why the genome could not be analysed. (D) The individual results pages display a locus viewer, which shows each identified defence system in the context of its surrounding genes. (E) A summarised version of the results is displayed under the locus viewer for an initial inspection of the systems identified. (F) Users can download the full PADLOC output and other raw files from the bottom of the page.
Output and interpretation
Each job submitted by a user is listed on the ‘My Results’ page (Figure 2C), which includes details of the job status and links to completed jobs. If the job failed, a link is provided to download the corresponding log file. The most common reason for a failed job is incorrect input formatting. For successful jobs, individual result pages contain information about interpreting the output, an interactive view of the locus structure of each detected defence system (Figure 2D), a summary table of systems detected (Figure 2E), and options to download the output files (Figure 2F). The main PADLOC output file (.csv format) lists all systems identified, with one gene per row. The file contains information regarding the type of system detected, the proteins present, their location in the genome, and details about the confidence of detection. The most important values to consider when evaluating detection confidence are the full sequence and domain E-values (full.seq.E.value and domain.iE.value), and the target and HMM coverages (target.coverage and hmm.coverage). Usually, hits with large E-values (indicating low statistical significance) and low coverages should be treated with caution. In general, multi-gene defence systems are detected with greater specificity than systems encoded by singe genes.
Considerations and limitations
As with any computational approach to inferring gene function, there are several potential limitations that users should consider when interpreting PADLOC web server outputs. Many defence proteins contain ubiquitous domains and their HMMs are more likely to detect spurious hits. For example, PtuA (Septu), several retron proteins and Old nucleases contain similar ATPase domains (11). As a result, PADLOC sometimes reports overlapping system classifications (typically less than 1% of results), which should be resolved via subjective evaluation of the reported scoring parameters and genetic context. For multi-gene systems, the synteny requirements resolve many ambiguities and increase the confidence of system classification (due to the reduced probability of two adjacent false-positive hits). By contrast, identification of single-gene systems is more challenging and requires trade-offs with the HMM scoring cut-offs (E-value and HMM/target coverage thresholds) to achieve an acceptable balance between sensitivity versus specificity. As such, false positive and negative results are inevitably more frequent for single gene systems. In general, the PADLOC scoring thresholds are set more toward sensitivity, with the intention that users interested in further study of identified potential defence system homologs will undertake additional analyses.
As a first step to curating PADLOC results, inspection of the HMM and target alignment coverage scores can reveal potential false-positive classifications (Figure 3A–C). In some cases, very similar proteins differ in function due to the presence or absence of enzymatic sites or functional motifs (Figure 3D). We encourage users to explore subsequent domain-based analyses of defence system proteins using tools such as HHpred (76) to identify protein domains with more granularity. We have also found structure prediction tools such as AlphaFold2 (77) and ColabFold (78), useful in identifying domain folds and boundaries. Once demarcated, predicted domain structures can be searched against protein structure databases such as the PDB (79) or AlphaFold-based databases, using tools like DALI (80) or Foldseek (81). In many cases, this structure-based approach reveals homologs with characterised active sites or functional motifs, which can aid in discrimination between defence system proteins and similar non-orthologous proteins. Users may also find it informative to compare their PADLOC results with DefenseFinder, another tool recently developed for defence system identification (82,83).
Figure 3.
Considerations and limitations of defence system detection using profile HMMs and synteny criteria. (A) Likely hits to defence protein homologs have high alignment coverage between the HMM and target protein, including for multi-domain proteins, as the PADLOC HMMs were typically built from whole proteins rather than individual domains. (B) Users should be wary of cases where only part of the HMM aligns to the target protein, which may lack a domain important for function of defence system homologs. (C) Conversely, some defence protein domains have similarity to domains found within non-defence proteins, which can result in the PADLOC HMMs matching only part of the target protein. However, these cases might also represent defence system fusion proteins, or divergent homologs. (D) Where possible, users should follow up by also verifying the presence of expected active site residues and motifs important for domain fold and function. (E) Some defence systems are similar to non-defence molecular systems, such as the similarity between Wadjet and Muk systems. This is typically not an issue with multi-gene systems where all genes are present, but some [system]_other models may detect such cases where some genes are allowed to be absent. (F) Several [system]_other models allow the detection of fragmented multi-gene systems, which can be reconstructed manually after reviewing the results. (G) Pseudogenes within multi-gene systems are not detected by the default PADLOC workflow, so users should check [system]_other models for the potential presence of additional genes. In this example, a frameshift within jetC means the Wadjet system criteria are not fulfilled (because JetC was not detected) (H) In the precomputed PADLOC RefSeq dataset, pseudogenes were substituted with the protein homologs inferred by the PGAP pipeline, allowing the above example to be classified as a Wadjet Type I system. Pseudogenes are indicated by a red outline in the locus viewer and the prefix ‘pseudo_sub’ in the ‘target.name’ column of the output.
The similarity of several defence systems to other molecular systems inevitably leads to a background rate of false-positive system identifications. For example, Wadjet systems are similar to Muk structural maintenance of chromosomes (SMC) systems (21,84,85) (Figure 3E). The canonical Wadjet system comprises four proteins JetABCD, where JetA, JetB, and JetC share similarity with MukF, MukE, and MukB, respectively. The PADLOC Wadjet system definition requires the full JetABCD set to be present, whereas the MukFEB cases are reported as part of Wadjet ‘other’ systems. Several ‘[system]_other’ models (which generally require only two components of a system to be co-localised) are run alongside the stricter canonical system definitions, to enable identification of systems that might otherwise be overlooked due to their being split by contig boundaries (particularly in metagenomes) (Figure 3F), fragmented by multiple intervening genes (e.g. due to MGE insertions), genes missing due to sequencing, assembly or gene-calling errors, mutations, or high sequence divergence of some defence proteins (Figure 3G). For the precomputed RefSeq data, we substituted pseudogene products with similar full-length protein sequences, which allows higher-confidence assignment of the example system as Wadjet type I (Figure 3H). The identification of defence system pseudogenes will also be helpful for studies of defence system evolution and turnover. For some systems, such as the Dnd and Pbe phosphorothioation systems, several genes are relatively short (including dndE, pbeB/D) and are often missed by gene prediction tools (33,35). The ‘PT_other’ model will detect the remaining genes and users should then manually check for short coding sequences in the vicinity of the expected location of any absent genes. Lastly, several system definition models specify ‘optional’ genes (e.g. ‘cas_associated’ proteins) that are not necessarily functionally associated with the system. In some cases, these proteins may be uncharacterised independent bona fide defence systems, or might have non-defence functions. To guide users in interpreting results for ‘other’ models and resolving ambiguities, each ‘Results’ page on the PADLOC web server contains a list of known potential ambiguities.
The current snapshot of antiviral defence systems
To provide a current and comprehensive quantitative view of the defence systems in bacteria and archaea, we used PADLOC to search all RefSeq v209 Bacteria and Archaea genomes. These results are available for browsing on the PADLOC web server under the ‘RefSeq results’ page and will be updated periodically with new RefSeq versions and as new defence systems are discovered. Overall, the distribution of different defence systems across different bacteria and archaea is highly varied (Supplementary Figure S1). It should be noted that this analysis includes assemblies of varied completeness, and systems may be underrepresented in taxa with incomplete genomes. Clear differences in the diversity and abundance of defence systems were apparent between phyla (Figure 4A), with several notable outliers including Chlamydiota (very few known defence systems) and Cyanobacteria (many types of defence systems in high abundance). Campylobacterota had a significant bimodal distribution of defence system abundance (Hartigan's dip test (86), P < 0.001), which relates to the Helicobacteraceae relying on a remarkably large number of restriction modification systems in each strain (Figure 4B,C) (87,88). Another interesting phyla was Spirochaetota, which had a significant bimodal distribution of defence system diversity due to Borreliaceae having very few types of defence systems (Hartigan's dip test (86), P < 0.001) (Figure 4D). Almost all Borreliaceae are tick-borne pathogens (89) with characteristically small genomes typical of obligate host-associated pathogenic bacteria (90). Similarly, Treponema pallidum (family Treponemataceae) are host-associated pathogens with small genomes and lack known defence systems (Figure 4D). It remains to be resolved whether the low defence diversity is driven by genome reduction and due to less exposure to phages and MGEs than free-living relatives. Overall, these examples illustrate that the PADLOC web server can be used to interrogate hypotheses such as these and to then identify candidate taxa and strains in which predictions can be experimentally tested.
Figure 4:
Example analyses of PADLOC data reveal lineage-specific differences in defence system diversity and abundance. (A) Overview of the defence system diversity (the number of unique types of defence systems within a host genome) and abundance (total number of defence systems within a host genome), separated by phyla (per the GTDB taxonomy (91)). Only phyla with more than 50 genomes are displayed. (B) A closer look at defence system diversity and abundance within the Campylobacterota phylum. (C) A breakdown of the abundance of different defence system types within the Helicobacteraceae family of Campylobacterota. Defence system types occurring in less than five genomes are grouped under ‘Other types’. (D) Defence system diversity within Spriochaetota, revealing low defence diversity within Borreliaceae. The Treponemataceae lacking defence systems (types = 0) in this dataset are comprised entirely of Treponema pallidum.
DISCUSSION
To address the lack of capable and accessible tools for comprehensive defence system identification, we developed the PADLOC web server. Here, we have described the current state of PADLOC and important information regarding usage of the web server and interpretation of the output. PADLOC will continue to evolve as new defence systems are discovered and additional biological insight allows us to fine-tune the parameters of identification. Evaluating the accuracy of detection and adjusting these thresholds accordingly is difficult when only a few experimentally verified examples are available, as is currently the case with most defence systems. PADLOC provides a key initial step for further improvement, by facilitating the identification of many putative systems that can be followed up by functional investigation. Feedback and contribution to PADLOC and the web server is encouraged via the GitHub repository (https://github.com/padlocbio/padloc/issues), including but not limited to the addition of systems and HMMs, suggestions for adjusting thresholds or system classifications and nomenclature, and reporting bugs.
The comprehensive identification of many systems made possible with PADLOC also opens avenues for investigating many interesting biological questions. For example, many putative defence system genes are annotated as pseudogenes. Although pseudogenes can arise from sequencing or assembly errors, they might also include the remnants of defence systems that have become inactivated through mutation. Investigation of these remnants could provide insight into the ancestry and evolution of defence system arsenals that would otherwise remain undetected. In addition, our broad taxonomic analyses revealed notable differences in the diversity and abundance of defence systems in many taxa, raising the question of what factors drive the requirement for more defence against phages, and whether differences are driven by ecological factors such as phage diversity and encounter rates (reviewed in 92). Recent studies have also identified interplay between defence systems, with synergistic or antagonistic effects (42,93,94) and as more types of defence systems are discovered, our understanding of compatibility between systems needs to be revisited. The PADLOC web server provides a convenient and accessible platform to detect suitable candidate strains for experimental work to resolve these outstanding questions.
DATA AVAILABILITY
The PADLOC web server is freely available at https://padloc.otago.ac.nz. This website is open to all users and there is no login requirement. Source code and documentation for installing and running PADLOC locally are freely available from the PADLOC GitHub repository (https://github.com/padlocbio/padloc). The HMMs and system models used by the PADLOC web server are available from the PADLOC database repository (https://github.com/padlocbio/padloc-db).
Supplementary Material
ACKNOWLEDGEMENTS
Firstly, we thank the many researchers whose contributions that have contributed to development of PADLOC and the users that have provided constructive feedback to improve the PADLOC database and web server. We are also grateful to members of the Phage-host interactions laboratory at the University of Otago for helpful discussions and feedback. We acknowledge and appreciate the use of the New Zealand eScience Infastructure (NeSI) high-performance computing facilities in this research, which are funded jointly by NeSI collaborator institutions and the Ministry of Buisness, Innovation and Employment. N.T. is supported by grant PID2020-113207GB-I00 from the MCIN/ AEI /10.13039/501100011033.
Contributor Information
Leighton J Payne, Department of Microbiology and Immunology, University of Otago, Dunedin, New Zealand.
Sean Meaden, Biosciences, University of Exeter, Penryn, UK.
Mario R Mestre, Independent Researcher, Spain.
Chris Palmer, Information Technology Services Research and Teaching Group, University of Otago, Dunedin, New Zealand.
Nicolás Toro, Department of Soil Microbiology and Symbiotic Systems, Estación Experimental del Zaidín, Consejo Superior de Investigaciones Científicas, Structure, Dynamics and Function of Rhizobacterial Genomes, Grupo de Ecología Genética de la Rizosfera, Granada, Spain.
Peter C Fineran, Department of Microbiology and Immunology, University of Otago, Dunedin, New Zealand; Genetics Otago, University of Otago, Dunedin, New Zealand; Bioprotection Aotearoa, University of Otago, Dunedin, New Zealand; Maurice Wilkins Centre for Molecular Biodiscovery, University of Otago, Dunedin, New Zealand.
Simon A Jackson, Department of Microbiology and Immunology, University of Otago, Dunedin, New Zealand; Genetics Otago, University of Otago, Dunedin, New Zealand; Bioprotection Aotearoa, University of Otago, Dunedin, New Zealand; Maurice Wilkins Centre for Molecular Biodiscovery, University of Otago, Dunedin, New Zealand.
SUPPLEMENTARY DATA
Supplementary Data are available at NAR Online.
FUNDING
This work was supported by the Royal Society of New Zealand Te Apārangi (RSNZ) Marsden Fund, the School of Biomedical Sciences Bequest Fund from the University of Otago and Bioprotection Aotearoa (Tertiary Education Commission, NZ). L.J.P. was supported by a University of Otago Doctoral Scholarship.
Conflict of interest statement. None declared.
REFERENCES
- 1. Bernheim A., Sorek R.. The pan-immune system of bacteria: antiviral defence as a community resource. Nat. Rev. Microbiol. 2020; 18:113–119. [DOI] [PubMed] [Google Scholar]
- 2. Hampton H.G., Watson B.N.J., Fineran P.C.. The arms race between bacteria and their phage foes. Nature. 2020; 577:327–336. [DOI] [PubMed] [Google Scholar]
- 3. Isaev A.B., Musharova O.S., Severinov K.V.. Microbial arsenal of antiviral defenses – part I. Biochem. Mosc. 2021; 86:319–337. [DOI] [PubMed] [Google Scholar]
- 4. Isaev A.B., Musharova O.S., Severinov K.V.. Microbial arsenal of antiviral defenses. Part II. Biochem. Mosc. 2021; 86:449–470. [DOI] [PubMed] [Google Scholar]
- 5. Tal N., Sorek R.. SnapShot: bacterial immunity. Cell. 2022; 185:578–578. [DOI] [PubMed] [Google Scholar]
- 6. Payne L.J., Todeschini T.C., Wu Y., Perry B.J., Ronson C.W., Fineran P.C., Nobrega F.L., Jackson S.A.. Identification and classification of antiviral defence systems in bacteria and archaea with PADLOC reveals new system types. Nucleic Acids Res. 2021; 49:10868–10878. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Abby S.S., Néron B., Ménager H., Touchon M., Rocha E.P.C.. MacSyFinder: a program to mine genomes for molecular systems with an application to CRISPR-Cas systems. PLoS ONE. 2014; 9:e110726. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Russel J., Pinilla-Redondo R., Mayo-Muñoz D., Shah S.A., Sørensen S.J.. CRISPRCasTyper: automated identification, annotation, and classification of CRISPR-Cas loci. CRISPR J. 2020; 3:462–469. [DOI] [PubMed] [Google Scholar]
- 9. Abby S.S., Cury J., Guglielmini J., Néron B., Touchon M., Rocha E.P.C.. Identification of protein secretion systems in bacterial genomes. Sci. Rep. 2016; 6:23080. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Burroughs A.M., Iyer L.M., Aravind L.. Two novel PIWI families: roles in inter-genomic conflicts in bacteria and Mediator-dependent modulation of transcription in eukaryotes. Biol. Direct. 2013; 8:13. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Mestre M.R., González-Delgado A., Gutiérrez-Rus L.I., Martínez-Abarca F., Toro N.. Systematic prediction of genes functionally associated with bacterial retrons and classification of the encoded tripartite systems. Nucleic Acids Res. 2020; 48:12632–12647. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Rousset F., Depardieu F., Solange M., Dowding J., Laval A.-L., Lieberman E., Garry D., Rocha E.P.C., Bernheim A., Bikard D.. Phages and their satellites encode hotspots of antiviral systems. Cell. 2022; 30:740–753. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Couvin D., Bernheim A., Toffano-Nioche C., Touchon M., Michalik J., Néron B., Rocha E.P.C., Vergnaud G., Gautheret D., Pourcel C.. CRISPRCasFinder, an update of CRISRFinder, includes a portable version, enhanced performance and integrates search for Cas proteins. Nucleic Acids Res. 2018; 46:W246–W251. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. El-Gebali S., Mistry J., Bateman A., Eddy S.R., Luciani A., Potter S.C., Qureshi M., Richardson L.J., Salazar G.A., Smart A.et al.. The Pfam protein families database in 2019. Nucleic Acids Res. 2019; 47:D427–D432. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Shmakov S.A., Makarova K.S., Wolf Y.I., Severinov K.V., Koonin E.V.. Systematic prediction of genes functionally linked to CRISPR-Cas systems by gene neighborhood analysis. Proc. Natl. Acad. Sci. 2018; 115:E5307–E5316. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Galperin M.Y., Kristensen D.M., Makarova K.S., Wolf Y.I., Koonin E.V.. Microbial genome analysis: the COG approach. Brief. Bioinform. 2019; 20:1063–1070. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Makarova K.S., Wolf Y.I., Iranzo J., Shmakov S.A., Alkhnbashi O.S., Brouns S.J.J., Charpentier E., Cheng D., Haft D.H., Horvath P.et al.. Evolutionary classification of CRISPR–Cas systems: a burst of class 2 and derived variants. Nat. Rev. Microbiol. 2020; 18:67–83. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Edgar R.C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004; 32:1792–1797. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Eddy S.R. Accelerated profile HMM searches. PLoS Comput. Biol. 2011; 7:e1002195. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Bernheim A., Millman A., Ofir G., Meitav G., Avraham C., Shomar H., Rosenberg M.M., Tal N., Melamed S., Amitai G.et al.. Prokaryotic viperins produce diverse antiviral molecules. Nature. 2021; 589:120–124. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Doron S., Melamed S., Ofir G., Leavitt A., Lopatina A., Keren M., Amitai G., Sorek R.. Systematic discovery of antiphage defense systems in the microbial pangenome. Science. 2018; 359:eaar4120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Gao L., Altae-Tran H., Böhning F., Makarova K.S., Segel M., Schmid-Burgk J.L., Koob J., Wolf Y.I., Koonin E.V., Zhang F.. Diverse enzymatic activities mediate antiviral immunity in prokaryotes. Science. 2020; 369:1077–1084. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Goldfarb T., Sberro H., Weinstock E., Cohen O., Doron S., Charpak-Amikam Y., Afik S., Ofir G., Sorek R.. BREX is a novel phage resistance system widespread in microbial genomes. EMBO J. 2015; 34:169–183. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Johnson A.G., Wein T., Mayer M.L., Duncan-Lowey B., Yirmiya E., Oppenheimer-Shaanan Y., Amitai G., Sorek R., Kranzusch P.J.. Bacterial gasdermins reveal an ancient mechanism of cell death. Science. 2022; 375:221–225. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. Makarova K.S., Wolf Y.I., van der Oost J., Koonin E.V.. Prokaryotic homologs of Argonaute proteins are predicted to function as key components of a novel system of defense against mobile genetic elements. Biol. Direct. 2009; 4:29. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Millman A., Melamed S., Amitai G., Sorek R.. Diversity and classification of cyclic-oligonucleotide-based anti-phage signalling systems. Nat. Microbiol. 2020; 5:1608–1615. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27. Ofir G., Melamed S., Sberro H., Mukamel Z., Silverman S., Yaakov G., Doron S., Sorek R.. DISARM is a widespread bacterial defence system with broad anti-phage activities. Nat. Microbiol. 2018; 3:90–98. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Roberts R.J., Vincze T., Posfai J., Macelis D.. REBASE—a database for DNA restriction and modification: enzymes, genes and genomes. Nucleic Acids Res. 2015; 43:D298–D299. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. Shah S.A., Alkhnbashi O.S., Behler J., Han W., She Q., Hess W.R., Garrett R.A., Backofen R.. Comprehensive search for accessory proteins encoded with archaeal and bacterial type III CRISPR-cas gene cassettes reveals 39 new cas gene families. RNA Biol. 2019; 16:530–542. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30. Tal N., Millman A., Stokar-Avihail A., Fedorenko T., Leavitt A., Melamed S., Yirmiya E., Avraham C., Amitai G., Sorek R.. Antiviral defense via nucleotide depletion in bacteria. 2021; bioRxiv doi:26 April 2021, preprint: not peer reviewed 10.1101/2021.04.26.441389. [DOI]
- 31. Tal N., Morehouse B.R., Millman A., Stokar-Avihail A., Avraham C., Fedorenko T., Yirmiya E., Herbst E., Brandis A., Mehlman T.et al.. Cyclic CMP and cyclic UMP mediate bacterial immunity against phages. Cell. 2021; 184:5728–5739. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32. Thiaville J.J., Kellner S.M., Yuan Y., Hutinet G., Thiaville P.C., Jumpathong W., Mohapatra S., Brochier-Armanet C., Letarov A.V., Hillebrand R.et al.. Novel genomic island modifies DNA with 7-deazaguanine derivatives. Proc. Natl. Acad. Sci. 2016; 113:E1452–E1459. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33. Tong T., Chen S., Wang L., Tang Y., Ryu J.Y., Jiang S., Wu X., Chen C., Luo J., Deng Z.et al.. Occurrence, evolution, and functions of DNA phosphorothioate epigenetics in bacteria. Proc. Natl. Acad. Sci. 2018; 115:E2988–E2996. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34. Wang S., Wan M., Huang R., Zhang Y., Xie Y., Wei Y., Ahmad M., Wu D., Hong Y., Deng Z.et al.. SspABCD-SspFGH constitutes a new type of DNA phosphorothioate-based bacterial defense system. Mbio. 2021; 12:e00613-21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35. Xiong L., Liu S., Chen S., Xiao Y., Zhu B., Gao Y., Zhang Y., Chen B., Luo J., Deng Z.et al.. A new type of DNA phosphorothioation-based antiviral system in archaea. Nat. Commun. 2019; 10:1688. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36. Xiong X., Wu G., Wei Y., Liu L., Zhang Y., Su R., Jiang X., Li M., Gao H., Tian X.et al.. SspABCD–SspE is a phosphorothioation-sensing bacterial defence system with broad anti-phage activities. Nat. Microbiol. 2020; 5:917–928. [DOI] [PubMed] [Google Scholar]
- 37. Xu T., Yao F., Zhou X., Deng Z., You D.. A novel host-specific restriction system associated with DNA backbone S-modification in Salmonella. Nucleic Acids Res. 2010; 38:7133–7141. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38. Yuan Y., Hutinet G., Valera J.G., Hu J., Hillebrand R., Gustafson A., Iwata-Reuyl D., Dedon P.C., de Crécy-Lagard V.. Identification of the minimal bacterial 2′-deoxy-7-amido-7-deazaguanine synthesis machinery. Mol. Microbiol. 2018; 110:469–483. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39. Zeng Z., Chen Y., Pinilla-Redondo R., Shah S.A., Zhao F., Wang C., Hu Z., Zhang C., Whitaker R.J., She Q.et al.. A short prokaryotic argonaute cooperates with membrane effector to confer antiviral defense. 2021; bioRxiv doi:11 December 2021, preprint: not peer reviewed 10.1101/2021.12.09.471704. [DOI] [PubMed]
- 40. Anba J., Bidnenko E., Hillier A., Ehrlich D., Chopin M.C.. Characterization of the lactococcal abiD1 gene coding for phage abortive infection. J. Bacteriol. 1995; 177:3818–3823. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41. Bergsland K.J., Kao C., Yu Y.-T.N., Gulati R., Snyder L.. A site in the T4 bacteriophage major head protein gene that can promote the inhibition of all translation in Escherichia coli. J. Mol. Biol. 1990; 213:477–494. [DOI] [PubMed] [Google Scholar]
- 42. Birkholz N., Jackson S.A., Fagerlund R.D., Fineran P.C.. A mobile restriction–modification system provides phage defence and resolves an epigenetic conflict with an antagonistic endonuclease. Nucleic Acids Res. 2022; 50:3361–3348. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43. Bouchard J.D., Dion E., Bissonnette F., Moineau S.. Characterization of the two-component abortive phage infection mechanism AbiT from Lactococcus lactis. J. Bacteriol. 2002; 184:6325–6332. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44. Cluzel P.J., Chopin A., Ehrlich S.D., Chopin M.C.. Phage abortive infection mechanism from Lactococcus lactis subsp. lactis, expression of which is mediated by an Iso-ISS1 element. Appl. Environ. Microbiol. 1991; 57:3547–3551. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45. Cram D., Ray A., Skurray R.. Molecular analysis of F plasmid pif region specifying abortive infection of T7 phage. Mol. Gen. Genet. MGG. 1984; 197:137–142. [DOI] [PubMed] [Google Scholar]
- 46. Dai G., Su P., Allison G.E., Geller B.L., Zhu P., Kim W.S., Dunn N.W.. Molecular characterization of a new abortive infection system (AbiU) from Lactococcus lactis LL51-1. Appl. Environ. Microbiol. 2001; 67:5225–5232. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47. Deng Y.-M., Liu C.-Q., Dunn W.. Genetic organization and functional analysis of a novel phage abortive infection system, AbiL, from Lactococcus lactis. J. Biotechnol. 1999; 67:135–149. [DOI] [PubMed] [Google Scholar]
- 48. Deng Y.-M., Harvey M.L., Liu C.-Q., Dunn N.W.. A novel plasmid-encoded phage abortive infection system from Lactococcus lactis biovar. diacetylactis. FEMS Microbiol. Lett. 2006; 146:149–154. [DOI] [PubMed] [Google Scholar]
- 49. Dinsmore P.K., Klaenhammer T.R.. Phenotypic consequences of altering the copy number of abiA, a gene responsible for aborting bacteriophage infections in Lactococcus lactis. Appl. Environ. Microbiol. 1994; 60:1129–1136. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50. Domingues S., Chopin A., Ehrlich S.D., Chopin M.-C.. The lactococcal abortive phage infection system AbiP prevents both phage DNA replication and temporal transcription switch. J. Bacteriol. 2004; 186:713–721. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51. Durmaz E., Higgins D.L., Klaenhammer T.R.. Molecular characterization of a second abortive phage resistance gene present in Lactococcus lactis subsp. lactis ME2. J. Bacteriol. 1992; 174:7463–7469. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52. Dy R.L., Przybilski R., Semeijn K., Salmond G.P.C., Fineran P.C.. A widespread bacteriophage abortive infection system functions through a type IV toxin–antitoxin mechanism. Nucleic Acids Res. 2014; 42:4590–4605. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53. Emond E., Holler B.J., Boucher I., Vandenbergh P.A., Vedamuthu E.R., Kondo J.K., Moineau S.. Phenotypic and genetic characterization of the bacteriophage abortive infection mechanism AbiK from Lactococcus lactis. Appl. Environ. Microbiol. 1997; 63:1274–1283. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54. Emond E., Dion E., Walker S.A., Vedamuthu E.R., Kondo J.K., Moineau S.. AbiQ, an abortive infection mechanism from Lactococcus lactis. Appl. Environ. Microbiol. 1998; 64:4748–4756. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55. Garvey P., Fitzgerald G.F., Hill C.. Cloning and DNA sequence analysis of two abortive infection phage resistance determinants from the lactococcal plasmid pNP40. Appl. Environ. Microbiol. 1995; 61:4321–4328. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56. Jabbar M.A., Snyder L.. Genetic and physiological studies of an Escherichia coli locus that restricts polynucleotide kinase- and RNA ligase-deficient mutants of bacteriophage T4. J. Virol. 1984; 51:522–529. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57. Lindahl G., Sironi G., Bialy H., Calendar R.. Bacteriophage lambda; abortive infection of bacteria lysogenic for phage P2. Proc. Natl. Acad. Sci. 1970; 66:587–594. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58. McLandsborough L.A., Kolaetis K.M., Requena T., McKay L.L.. Cloning and characterization of the abortive infection genetic determinant abiD isolated from pBF61 of Lactococcus lactis subsp. lactis KR5. Appl. Environ. Microbiol. 1995; 61:2023–2026. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59. Millman A., Bernheim A., Stokar-Avihail A., Fedorenko T., Voichek M., Leavitt A., Oppenheimer-Shaanan Y., Sorek R.. Bacterial retrons function in anti-phage defense. Cell. 2020; 183:1551–1561. [DOI] [PubMed] [Google Scholar]
- 60. O’Connor L., Coffey A., Daly C., Fitzgerald G.F.. AbiG, a genotypically novel abortive infection mechanism encoded by plasmid pCI750 of Lactococcus lactis subsp. cremoris UC653. Appl. Environ. Microbiol. 1996; 62:3075–3082. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61. Owen S.V., Wenner N., Dulberger C.L., Rodwell E.V., Bowers-Barnard A., Quinones-Olvera N., Rigden D.J., Rubin E.J., Garner E.C., Baym M.et al.. Prophages encode phage-defense systems with cognate self-immunity. Cell Host Microbe. 2021; 29:1620–1633. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62. Parma D.H., Snyder M., Sobolevski S., Nawroz M., Brody E., Gold L.. The Rex system of bacteriophage lambda: tolerance and altruistic cell death. Genes Dev. 1992; 6:497–510. [DOI] [PubMed] [Google Scholar]
- 63. Parreira R., Ehrlich S.D., Chopin M.-C.. Dramatic decay of phage transcripts in lactococcal cells carrying the abortive infection determinant AbiB. Mol. Microbiol. 1996; 19:221–230. [DOI] [PubMed] [Google Scholar]
- 64. Prévots F., Ritzenthaler P.. Complete sequence of the new lactococcal abortive phage resistance gene abiO. J. Dairy Sci. 1998; 81:1483–1485. [DOI] [PubMed] [Google Scholar]
- 65. Prévots F., Daloyau M., Bonin O., Dumont X., Tolou S.. Cloning and sequencing of the novel abortive infection gene abiH of Lactococcuslactis ssp. lactis biovar. diacetylactis S94. FEMS Microbiol. Lett. 1996; 142:295–299. [DOI] [PubMed] [Google Scholar]
- 66. Prévots F., Tolou S., Delpech B., Kaghad M., Daloyau M.. Nucleotide sequence and analysis of the new chromosomal abortive infection gene abiN of Lactococcus lactis subsp. cremoris S114. FEMS Microbiol. Lett. 1998; 159:331–336. [DOI] [PubMed] [Google Scholar]
- 67. Sberro H., Leavitt A., Kiro R., Koh E., Peleg Y., Qimron U., Sorek R.. Discovery of functional toxin/antitoxin systems in bacteria by shotgun cloning. Mol. Cell. 2013; 50:136–148. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68. Smith H.S., Pizer L.I., Pylkas L., Lederberg S.. Abortive infection of shigella dysenteriae P2 by T2 bacteriophage. J. Virol. 1969; 4:162–168. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69. Su P., Harvey M., Im H.J., Dunn N.W.. Isolation, cloning and characterisation of the abiI gene from Lactococcus lactis subsp. lactis M138 encoding abortive phage infection. J. Biotechnol. 1997; 54:95–104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70. Biswas A., Staals R.H.J., Morales S.E., Fineran P.C., Brown C.M.. CRISPRDetect: a flexible algorithm to define CRISPR arrays. BMC Genomics. 2016; 17:356. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71. Nawrocki E.P., Eddy S.R.. Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics. 2013; 29:2933–2935. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72. Haft D.H., DiCuccio M., Badretdin A., Brover V., Chetvernin V., O’Neill K., Li W., Chitsaz F., Derbyshire M.K., Gonzales N.R.et al.. RefSeq: an update on prokaryotic genome annotation and curation. Nucleic Acids Res. 2018; 46:D851–D860. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73. Tatusova T., DiCuccio M., Badretdin A., Chetvernin V., Nawrocki E.P., Zaslavsky L., Lomsadze A., Pruitt K.D., Borodovsky M., Ostell J.. NCBI prokaryotic genome annotation pipeline. Nucleic Acids Res. 2016; 44:6614–6624. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74. Cock P.J.A., Antao T., Chang J.T., Chapman B.A., Cox C.J., Dalke A., Friedberg I., Hamelryck T., Kauff F., Wilczynski B.et al.. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics. 2009; 25:1422–1423. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75. Hyatt D., Chen G.-L., LoCascio P.F., Land M.L., Larimer F.W., Hauser L.J.. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics. 2010; 11:119. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76. Zimmermann L., Stephens A., Nam S.-Z., Rau D., Kübler J., Lozajic M., Gabler F., Söding J., Lupas A.N., Alva V.. A completely reimplemented MPI bioinformatics toolkit with a new HHpred server at its core. J. Mol. Biol. 2018; 430:2237–2243. [DOI] [PubMed] [Google Scholar]
- 77. Jumper J., Evans R., Pritzel A., Green T., Figurnov M., Ronneberger O., Tunyasuvunakool K., Bates R., Žídek A., Potapenko A.et al.. Highly accurate protein structure prediction with AlphaFold. Nature. 2021; 596:583–589. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78. Mirdita M., Schütze K., Moriwaki Y., Heo L., Ovchinnikov S., Steinegger M.. ColabFold - Making protein folding accessible to all. 2022; bioRxiv doi:15 August 2021, preprint: not peer reviewed 10.1101/2021.08.15.456425. [DOI] [PMC free article] [PubMed]
- 79. Berman H.M. The protein data bank. Nucleic Acids Res. 2000; 28:235–242. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80. Holm L. DALI and the persistence of protein shape. Protein Sci. 2020; 29:128–140. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81. Kempen M., Kim S.S., Tumescheit C., Mirdita M., Söding J., Steinegger M.. Foldseek: fast and accurate protein structure search. 2022; bioRxiv doi:09 February 2022, preprint: not peer reviewed 10.1101/2022.02.07.479398. [DOI] [PMC free article] [PubMed]
- 82. Cazares A., Figueroa W., Cazares D.. Diversity of microbial defence systems. Nat. Rev. Microbiol. 2022; 20:191. [DOI] [PubMed] [Google Scholar]
- 83. Tesson F., Hervé A., Touchon M., Humières C., Cury J., Bernheim A.. Systematic and quantitative view of the antiviral arsenal of prokaryotes. 2021; bioRxiv doi:03 September 2021, preprint: not peer reviewed 10.1101/2021.09.02.458658. [DOI] [PMC free article] [PubMed]
- 84. Krishnan A., Burroughs A.M., Iyer L.M., Aravind L.. Comprehensive classification of ABC ATPases and their functional radiation in nucleoprotein dynamics and biological conflict systems. Nucleic Acids Res. 2020; 48:10045–10075. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 85. Panas M.W., Jain P., Yang H., Mitra S., Biswas D., Wattam A.R., Letvin N.L., Jacobs W.R.. Noncanonical SMC protein in Mycobacterium smegmatis restricts maintenance of Mycobacterium fortuitum plasmids. Proc. Natl. Acad. Sci. 2014; 111:13264–13271. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86. Hartigan J.A., Hartigan P.M.. The dip test of unimodality. Ann. Stat. 1985; 13:70–84. [Google Scholar]
- 87. Krebes J., Morgan R.D., Bunk B., Spröer C., Luong K., Parusel R., Anton B.P., König C., Josenhans C., Overmann J.et al.. The complex methylome of the human gastric pathogen Helicobacter pylori. Nucleic Acids Res. 2014; 42:2415–2432. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 88. Lin L.F., Posfai J., Roberts R.J., Kong H.. Comparative genomics of the restriction-modification systems in Helicobacter pylori. Proc. Natl. Acad. Sci. U. S. A. 2001; 98:2740–2745. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 89. Barbour A.G., Gupta R.S.. The family borreliaceae (Spirochaetales), a diverse group in two genera of tick-borne spirochetes of mammals, birds, and reptiles. J. Med. Entomol. 2021; 58:1513–1524. [DOI] [PubMed] [Google Scholar]
- 90. Moran N.A. Microbial minimalism: genome reduction in bacterial pathogens. Cell. 2002; 108:583–586. [DOI] [PubMed] [Google Scholar]
- 91. Parks D.H., Chuvochina M., Chaumeil P.-A., Rinke C., Mussig A.J., Hugenholtz P.. A complete domain-to-species taxonomy for Bacteria and Archaea. Nat. Biotechnol. 2020; 38:1079–1086. [DOI] [PubMed] [Google Scholar]
- 92. van Houte S., Buckling A., Westra E.R.. Evolutionary ecology of prokaryotic immune mechanisms. Microbiol. Mol. Biol. Rev. 2016; 80:745–763. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 93. Hynes A.P., Villion M., Moineau S.. Adaptation in bacterial CRISPR-Cas immunity can be driven by defective phages. Nat. Commun. 2014; 5:4399. [DOI] [PubMed] [Google Scholar]
- 94. Maguin P., Varble A., Modell J.W., Marraffini L.A.. Cleavage of viral DNA by restriction endonucleases stimulates the type II CRISPR-Cas immune response. Mol. Cell. 2022; 82:907–919. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The PADLOC web server is freely available at https://padloc.otago.ac.nz. This website is open to all users and there is no login requirement. Source code and documentation for installing and running PADLOC locally are freely available from the PADLOC GitHub repository (https://github.com/padlocbio/padloc). The HMMs and system models used by the PADLOC web server are available from the PADLOC database repository (https://github.com/padlocbio/padloc-db).




