ABSTRACT
A CRISPR locus, defined by an array of repeat and spacer elements, constitutes a genetic record of the ceaseless battle between bacteria and viruses, showcasing the genomic integration of spacers acquired from invasive DNA. In particular, iterative spacer acquisitions represent unique evolutionary histories and are often useful for high-resolution bacterial genotyping, including comparative analysis of closely related organisms, clonal lineages, and clinical isolates. Current spacer visualization methods are typically tedious and can require manual data manipulation and curation, including spacer extraction at each CRISPR locus from genomes of interest. Here, we constructed a high-throughput extraction pipeline coupled with a local web-based visualization tool which enables CRISPR spacer and repeat extraction, rapid visualization, graphical comparison, and progressive multiple sequence alignment. We present the bioinformatic pipeline and investigate the loci of reference CRISPR-Cas systems and model organisms in 4 well-characterized subtypes. We illustrate how this analysis uncovers the evolutionary tracks and homology shared between various organisms through visual comparison of CRISPR spacers and repeats, driven through progressive alignments. Due to the ability to process unannotated genome files with minimal preparation and curation, this pipeline can be implemented promptly. Overall, this efficient high-throughput solution supports accelerated analysis of genomic data sets and enables and expedites genotyping efforts based on CRISPR loci.
KEYWORDS: CRISPR-Cas, repeat detection, software, CRISPR spacer, crRNA
1. Introduction
Due to the selective pressure of viral predation, bacteria have evolved multiple defence mechanisms against bacteriophage infection [1–3]. Innate immunity is often provided by restriction-modification and abortive-infection systems, while adaptive immunity is driven by a range of diversified CRISPR-Cas (clustered regularly interspaced short palindromic repeats-CRISPR associated proteins) systems [1,4]. Essential to CRISPR-Cas based adaptive immunity is a process called adaptation, in which invading foreign DNA is integrated into the CRISPR array to impede future attacks [5–13]. As a result, these repeat-spacer arrays represent a unique evolutionary record of attacks by bacteriophages and other invasive and mobile genetic elements. The industrial relevance of CRISPR-based genotyping has been demonstrated in several bacterial species widely found in the food supply chain, encompassing starter cultures and food spoilage organisms such as Streptococcus thermophilus [14], Bifidobacterium longum [15], and Lactobacillus buchneri [16]. CRISPR-based genotyping also exhibits significant medical relevance through its usage in identification of pathogens such as Mycobacterium tuberculosis [17,18], Salmonella enterica [19,20], Clostridium difficile [21], Escherichia coli and many others [22,23], including food-borne pathogens. Characterization of CRISPR-Cas systems has also revealed alternative functions such as roles in virulence regulation and DNA repair [24]. Understanding the origin and structure of CRISPR loci and their transcribed RNA (crRNA) is a shared thread running through many fields and functions: from the novice CRISPR biologist recently introduced to adaptive immunity, to the expert investigating a bacterial clade; from the genome assembler working to decode repetitive genomic regions, to the explorations of a phylogeneticist [17]. Though the genetic locus is the defining feature of CRISPR-Cas systems, their function hinges on CRISPR-derived RNA molecules that guide the activity and efficiency of these immune systems. Indeed, crRNA, and thus, CRISPR repeats and spacers, are the fundamental basis of CRISPR-Cas-based genome editing, with RNA-guides at its core. It is thus critical to discern and characterize the loci and sequences that define crRNAs. Yet, there is a surprising paucity of tools dedicated to understanding, manipulating and visualizing these loci. Pipeline analysis of the DNA sequences underlying active or ancient crRNA is a powerful tool in unsupervised genomic discovery, especially in an era where sequencing has become affordable, and high-throuput -omic technologies have yielded genomic and metagenomic sequencing data en masse. The resulting analytical bottleneck is hampered by the difficulty with which repeated regions are sequenced and assembled. Visualization and comparison of CRISPR loci bridges the fields of genome editing and CRISPR biology, providing a common denominator between disciplines. Although a handful of useful bioinformatic tools have been developed to facilitate CRISPR repeat and spacer identification, the space for large-scale discovery and comparison remains untapped [25–28]. These tools are largely centred around CRISPR-Cas detection and investigation of related features at the single-organism level, enabling deep exploration and discovery with narrow scope and are not optimized for direct comparison across organisms. Often, these jobs are limited to a single genome file at a time and can take several minutes to run per strain depending on the level of detail provided for each locus. Nevertheless, we used specific features of existing CRISPR tools (such as CRT, CRISPR finder, CRISPRCasFinder, CRISPRdb, CRISPRdetect, CRISPRone, CRISPRdisco and more) as a basis to develop a more efficient and scalable pipeline. Direct nucleotide comparison is time consuming and cumbersome in contrast to visual comparison, which is more intuitive and better suited for comparison across multiple organisms. Here, we present an automated pipeline that combines the power and speed of current bioinformatic tools with the user-interface of the modern web-browser for improved analysis of CRISPR arrays. We use this technology to explore reference systems and model strains across the two established CRISPR classes and 4 major types, with examples for reference subtypes: I-E, II-A, III-A, and V-A [29–31]. Interestingly, results illustrate the speed at which evolutionary relationships can be recognized and explored, and provide insights into their function and activity.
2. Materials and methods
2.1. Extraction and conversion pipeline
The repeat and spacer sequence extraction is performed by MinCED (github.com/ctSkennerton/minced), which is derived from the CRISPR Recognition Tool (CRT) [26] . It is wrapped in a custom Bash script which enables parallel spacer/repeat extraction across multiple genomes using the shell’s native background job management system – drastically reducing execution time when processing large numbers of genomes. The custom script also facilitates the splitting of loci into individual rows and converts MinCED repeat output data to .fasta format and files for downstream analyses. The required user input is simple, and can consist of any genome, contig(s), or sequence file in .fasta format. If a specific file is not provided, all genome files in the current working directory will be analyzed. Extracted CRISPR spacer and repeat sequences are converted into an easily comparable colour and glyphicon (getbootstrap.com/docs/3.3/components/). Each spacer or repeat is represented by two components: a coloured square, representing the composition of a subset of the sequence; and a coloured symbol, representing the composition and length of the sequence. The conversion algorithm was written in Python – adapted from a VBA algorithm originally created by Philippe Horvath and colleagues [14]. This engine outputs the conversion data into a .json file available for uptake by a local server.
2.2. Reference CRISPR-Cas systems for select types and subtypes
Select genomes were sampled to encompass the previously defined reference strains across the two main CRISPR-Cas system classes, as well as the four major types (I, II, III, V) and select subtypes (I-E, II-A, III-A, V-A). Additional genomes within similar species carrying these CRISPR-Cas systems were randomly selected through a RefSeq search for comparative analysis with the appropriate reference loci [29,30,35] (Table 1).
Table 1.
Strains used in pipeline analysis.
| Class | Type/Subtype | Strain | Accession |
|---|---|---|---|
| Class 1 | I-E | Escherichia coli O111:H11 str. CVM9455 | NZ_AKAX00000000.1 |
| Escherichia coli O26:H11 | NZ_BEVX00000000.1 | ||
| Escherichia coli K-12 strain ER3476 | NZ_CP010440.1 | ||
| Escherichia coli strain K-12 NEB 5-alpha | NZ_CP017100.1 | ||
| Escherichia coli strain STEC 690 | NZ_LOFJ00000000.1 | ||
| Escherichia coli strain 90 | NZ_NSKO00000000.1 | ||
| III-A | Staphylococcus epidermidis strain 987 SEPI | NZ_JUOW00000000.1 | |
| Staphylococcus epidermidis 125 SEPI | NZ_JVXA00000000.1 | ||
| Staphylococcus epidermidis strain ABKWR | NZ_LYLX00000000.1 | ||
| Staphylococcus epidermidis strain A1KF08 | NZ_LYPX00000000.1 | ||
| Staphylococcus epidermidis strain Bt1p3 | NZ_LYPY00000000.1 | ||
| Staphylococcus epidermidis AOKB02 | NZ_LYPZ00000000.1 | ||
| Staphylococcus epidermidis KED12 | NZ_LYQA00000000.1 | ||
| Staphylococcus epidermidis XE_C06 | NZ_LYQB00000000.1 | ||
| Staphylococcus epidermidis ENVL_370 | NZ_LYVZ00000000.1 | ||
| Staphylococcus epidermidis Mt1p16 | NZ_MAJJ00000000.1 | ||
| Class 2 | II-A | Streptococcus thermophilus ND03 | NC_017563.1 |
| Streptococcus thermophilus strain APC151 | NZ_CP019935.1 | ||
| Streptococcus thermophilus KLDS3.1012 | NZ_LHSK00000000.1 | ||
| Streptococcus thermophilus St-10 | NZ_PHHG00000000.1 | ||
| Streptococcus thermophilus St-9 | NZ_PHHH00000000.1 | ||
| V-A | Francisella novicida Fx1 | NC_017450.1 | |
| Francisella tularensis subsp. novicida GA99-3549 | NZ_AAYF00000000.1 | ||
| Francisella tularensis subsp. novicida F6168 | NZ_CP009353.1 | ||
| Francisella tularensis subsp. novicida strain AL97-2214 | NZ_CP009653.1 | ||
| Francisella tularensis subsp. novicida strain DPG 3A-IS | NZ_CP010103.1 | ||
| Francisella tularensis subsp. novicida strain FAI | NZ_JOOT00000000.1 |
2.3. Web-based visualization tool
The web-based visualization tool is written in HTML5, CSS3, and jQuery and is locally served through port 4444 by SimpleHTTPServer in Python 2 or http.server in Python 3. User-generated changes to the data set can be saved via HTML5 local storage or exported to disc. Our custom multiple sequence alignment algorithm is implemented in JavaScript and is comprised of three distinct stages of execution. The primary stage uses the Needleman-Wunsch algorithm [32] to evaluate an exhaustive combination of global pairwise alignments, generating raw similarity scores for each pair of sequences from empirically selected match, mismatch, and gap penalty values: 4, −1, −2, respectively. These values are hard-coded into multipleSeqAlignment.js and can be modified as needed. The second stage uses UPGMA [33] to recursively generate a guide tree based on the similarity matrix. Branching order is determined by highest pairwise similarity score. If multiple maximum scores exist, one is chosen at random. Progressive alignment is implemented in stage three, based on the guide tree’s branching order and uses a sum-of-pairs scoring approach [34]. End gaps are not penalized and are removed from the user-interface to yield a more concise output – reduction in unnecessary gaps towards end of some sequences. While typical progressive alignments are concerned with comparison and gap insertion at the nucleotide level, we are concerned with comparison and gap insertion at the whole spacer level, enabling alignments of entire spacers as opposed to alignments of individual nucleotides. To this end, logic at the nucleotide level has been abstracted away by converting the nucleotide sequence of each spacer to an integer for speed of computational comparison, then by comparing integers as nucleotides are typically compared – resulting in a match, mismatch, or gap score. Nucleotide-to-gap (integer-to-gap) is scored as a typical gap (−2). The gap-to-gap score of −3 was determined empirically after observing the tendency for gap-to-gap alignments to be favoured and inserted in long stretches when its score was set to anything greater than or equal to the singular gap score of −2.
Additional visual manipulation is possible in the user-interface, including row and spacer sorting, reverse-complementation, and spacer/repeat view toggles. This software can be installed from the command-line and is also available as a Docker image. Command-line installation requires Bash, Java, Python, pip, Biopython, and Numpy. To download the pipeline and for additional detail on installation and use, please see the repository at (github.com/CRISPRlab/CRISPRviz).
2.4. Performance evaluation
Executions of 1, 10, and 100 genomes were normalized by using copies of a single genome: NC_015428. This genome was chosen based on its average size (~ 2.5 Mb) and familiarity. Each sample run was executed 5 times (n = 5). The machine used for execution is a 2016 MacBook Pro laptop with 8GB RAM and a 2.7 GHz Intel Core i5 processor. Subsequent runs on higher power processors yielded significantly shorter execution times, however, the conservative execution times were chosen for display as they are more likely to better represent the technology used by the majority of typical end users.
3. Results
3.1. Bioinformatic pipeline overview
CRISPR Visualizer is comprised of two main components – the extraction pipeline and conversion engine, and the user-facing web-based frontend. The extraction pipeline and conversion engine consist of a string searching algorithm that identifies CRISPR spacers and repeats through MinCED, as well as a conversion engine that converts sequence composition and length into RGB values and a corresponding symbol, respectively (Fig. 1A). The web-interface is comprised of a ‘File actions’ menu, a display area where rows of spacers and repeats are located, individual row action buttons, as well as a bank of action buttons that apply to the entire display area (Fig. 1B). The ‘File actions’ menu (Fig. 1C) includes saving and loading from local browser storage, import and export to disc in .json format, as well as a management feature which lists all saved browser-based entries (Fig. 1D). The row display area exhibits the spacers and repeats of a single genome per row or, can alternatively show an individual CRISPR locus per row when the pipeline is run with the split (-x) option. Each row and individual spacer are draggable – to enable fine tuning of the alignment through manual row rearrangement and gap manipulation. Adjacent to each row is a group of action buttons that apply specifically to the repeats and spacers of that row. This button collection allows the manual addition and removal of gaps (represented by an ‘x’ icon), row reversal, reverse-complementation, and row deletion (Fig. 1E). Beneath the row display area is a similar bank of buttons whose actions apply to all rows collectively. The first row of these buttons includes two sort features: by ascending and descending length, a direction reversal toggle, and a reverse-complement toggle. The second row includes a progressive multiple sequence alignment function for spacer alignment, and toggle buttons to show spacers, repeats, or spacers and repeats simultaneously (Fig. 2).
Figure 1.

Overview of bioinformatic pipeline. A) The pipeline extracts spacer and repeat sequences from .fasta files, which are converted into unique color/shape combinations representing the sequence composition and length of each unit. B) The web-interface contains a File actions menu, a row display area, and a bank of action buttons that control the current view of the display area. C) The File actions menu allows saving and loading to local browser storage, a local storage management feature, and import and export to disk. D) The ‘Manage’ menu, accessed from the File actions menu, displays items saved locally in browser storage. E) Each row contains actions buttons that can perform a specific task on that row: add or remove gaps, reverse the row direction, show each item’s reverse complement, and delete row.
Figure 2.

Action buttons that control the overall display area. The bottom of the web-interface contains a bank of action buttons that allow manipulation of the rows in the display area. The first row of buttons allows sorting rows by length, direction reversal, and displaying the reverse complement. The second row of buttons allows multiple sequence alignment and toggles to show spacers, repeats, or repeats and spacers together.
3.2. Reference type and subtype analysis
Reference and model strains and genomes, representing both Class 1 and Class 2 across the four major CRISPR-Cas system types and 4 model subtypes, were analyzed using the pipeline – encompassing 4 species and 27 strains in total: Escherichia coli as the Type I-E reference, Streptococcus thermophilus as the Type II-A reference, Staphylococcus epidermidis as the Type III-A reference, and Francisella novicida as the Type V-A model [31]. Importantly, these specific systems were selected because they have been defined as the canonical references for these subtypes [31], and some have been extensively used as models to unravel the molecular mechanisms of action of CRISPR-Cas systems (i.e. E. coli for Type I-E; S. thermophilus for Type II-A, S. epidermidis for Type III-A, and F. novicida for Type V-A).
Type I-E – Across 6 strains of Escherichia coli, including reference strain K12, only a single Type I-E system was found. 4 conserved spacer arrays were identified, as well as 2 unique spacer arrays – both belonging to NZ_NSKO00000000 (Fig. 3A). Repeats from NZ_BEVX00000000.1, NZ_LOFJ00000000.1, and NZ_NSKO00000000.1 display a modified terminal repeat shown in yellow, which is common and instrumental in determining locus orientation. Repeats directly adjacent to Type I-E cas genes are highly similar, while repeats that are located ~ 25 kb from the Type I-E cas genes show slight variation. These repeats likely evolved from a single ancestral array and were physically split over time.
Type II-A – The 5 selected strains of Streptococcus thermophilus yielded repeats and spacers from 3 distinct CRISPR-Cas systems, 2 distinct Type II-A systems and a Type III-A system.
3 clusters of conserved spacer arrays were found, as well as 3 unique spacer arrays belonging to NZ_PHHG00000000.1, NZ_PHHH00000000.3, and NZ_PHHH00000000.2 (Fig. 3B), illustrating their distinct genotypic characteristics that may be reflective of their individual environments. There are 3 distinct groups of repeats, each in close proximity to a different CRISPR-Cas system. There is significant variation among these 3 repeat groups, indicating an independent evolutionary association with the most proximal CRISPR-Cas system. Repeats belonging to the Type II-A Locus 2 system and the Type III-A system each have a modified terminal repeat. The wide range of diversity in both spacer and array length could warrant additional inquiry into the level of CRISPR activity and the organism’s biological niche.
Figure 3.

Visualization and alignment of select model systems. A) Alignment of Type I-E model strain Escherichia coli K12 spacers reveals 4 conserved families of loci and 2 unique spacer arrays. Repeat alignment shows distinct repeat groups that share repeat sequence and length, and are split by position relative to cas genes. B) Type II-A model species Streptococcus thermophilus spacer analysis reveals a diverse range of length and sequence composition. Repeat alignment displays 3 distinct spacer and repeat groups. C) Alignment of Type III-A model species Staphylococcus epidermidis spacers displays 2 conserved families of loci and one unique array. Repeat analysis shows 2 distinct repeat groups. D) Alignment of Type V-A model strain Francisella cf. novicida Fx1 spacers yield 3 conserved locus families and 2 unique loci. Two individual repeat groups can be seen after alignment.
Type III-A – Analysis of 10 of Staphylococcus epidermidis genomes revealed a single Type III-A system. 2 spacer arrays were highly conserved while a single unique spacer array was identified in NZ_JUOW00000000.1 (Fig. 3C). Repeats are clustered into 2 highly conserved groups: the first of which is proximally associated with Type III-A cas genes, and a second group which is lacking associated cas genes. There is some variation in the repeat sequences between these two groups as well as physical distance, indicating initial evolution from a single ancestral array followed by point mutations over time. Almost all repeat arrays contain a single modified terminal repeat – providing an evident indication of locus orientation.
Type V-A – We analyzed 6 strains of Francisella novicida, revealing 2 distinct CRISPR-Cas systems: a Type V-A system and a Type II-B system. 3 highly conserved spacer arrays were identified, followed by 2 unique spacer arrays which were both found in the reference strain NC_017450 (Fig. 3D). 2 distinct repeat groups emerge when arranged by sequence variation, each belonging to a respective CRISPR-Cas system. All Type V-A repeat arrays contain a modified terminal repeat, while only 1 Type II-B repeat array contains a modified terminal repeat.
3.3. Performance evaluation
To measure execution time, the pipeline was run against 1, 10, and 100 genomes and elapsed seconds necessary to complete the extraction and conversion process were recorded (Fig. 4). Each batch of genomes (1, 10, or 100) was run through the pipeline a total of 5 times to assess replicative consistency. The average execution time for a single genome was 2.93 seconds, while the average time until completion for 10 genomes was 10.72 seconds. To test a large-scale batch execution, 100 genomes were processed in parallel – resulting in an average of 114.38 seconds before total completion.
Figure 4.

Execution time. To demonstrate the pipeline’s ability to scale, batches of 1, 10, and 100 genomes were analyzed and the seconds until completion were recorded.
4. Discussion and conclusion
Successful execution of the pipeline presented here is solely based on output generated from MinCED extraction. At times, repeated nucleotides in spacer sequences or poorly assembled genomes can cause the spacer/repeat boundary to be shifted, misrepresenting the spacer and repeat lengths in the web-interface. Spacers and repeats experiencing this issue can be manually corrected in the underlying extraction files, however, the issue can be avoided primarily by using high quality genomes.
Analysis of strains containing CRISPR arrays from reference subtypes I-E, II-A, III-A, and V-A resulted in readily identifiable conserved and distinct loci across subtype genomes. Recognition of closely related strains through CRISPR-based genotyping was facilitated through the application of multiple sequence alignment. This custom implementation of a progressive alignment algorithm is optimized to support genotypic identification through spacer and repeat analysis and is best suited for use against somewhat similar to highly similar spacer sequences. The current alignment model uses a single pass for tree building and pairwise alignment scoring, and could benefit from iterative refinement, as well as a weighted sum-of-pairs approach for improved tree building. Bacterial strains that share highly conserved spacer arrays can be located quickly and assessed for putative CRISPR activity in silico based on the presence of additional spacers, and thus acquisition events; compared to genetically similar strains. Orientation of the CRISPR locus can also be determined in cases where the terminal repeat is modified (it often times carries a mutation at its 3ʹ end), as acquisition is strictly directional and occurs only at one end of the locus. However, additional inspection of the relative locations of cas1 and cas2 genes may also be used for further verification [5,36]. Of course, when transcriptional data is available, orientation and boundaries of RNA transcripts are also critical and valuable in these analyses. The development of a robust orientation detection function could be extremely advantageous for future discovery, and RNA studies underway will play a critical role in shaping the next iteration of this pipeline. CRISPR repeat and spacer lengths typically cluster around a canonical standard based on their respective subtype. Visualizing this signature is particularly helpful when mining the burgeoning quantities of uncharacterized metagenomic samples for CRISPR loci, supporting analysis of currently known systems and promoting discovery of novel systems.
The brief execution time of a substantial number of genomes in parallel supports high-throughput analysis, however, visually evaluating such a considerable number of arrays in cases of distantly related organisms may not be especially advantageous – primarily when performing progressive alignment. In these cases, the spacer and repeat files generated by the pipeline serve as a practical starting point for deeper analysis of the individual locus or organism, typically represented in a text-based table format or SQL-based report generated from spreadsheet tools like Excel or created from standalone local databases like postgres, mysql, etc.
5. Conclusion
The availability of genomic sequencing data has increased at a staggering rate and the need for automated bioinformatic tools for high-power analysis has never been greater, especially with the rise of CRISPR. Current methods for identification, comparison, and visualization of CRISPR loci across various organisms are time consuming and typically lack support and capacity for high-throughput analysis. This pipeline facilitates swift implementation and provides the necessary framework for the rapid generation of large amounts of spacer and repeat data across many organisms in parallel, as well as subsequent comparison through visualization and alignment. It is our hope that this tool will provide a convenient and effective platform for rapid CRISPR analysis across academia and industry alike; across the nascent field of CRISPR biology and throughout the accomplished arena of genome engineering. The pipeline could serve as an epidemiological tool, facilitating tracking of microevolution in diverging pathogenic strains, as well as a genomic tool, providing ancestral clues to bolster genomic and phylogenetic reconstruction from metagenomic samples. As the importance and necessity of CRISPR-Cas-based genome engineering continues to grow, a proficient understanding of CRISPR loci and associated crRNA in particular is paramount, given the importance of their sequences, structures, and transcriptional profiles, as they guide CRISPR effector nucleases to provide immunity in their native host, and enable a variety of applications when engineered.
Funding Statement
The authors acknowledge support from NC State University and the NC Ag Foundation;
Acknowledgments
We would like to thank Phillippe Horvath and colleagues for the initial development and implementation of the Excel-based CRISPR Spacer Macro, Claudio Hidalgo Cantabrana and Katelyn Brandt for their time, testing, and input throughout the development of the pipeline, as well as Cassandra Cañez for her graphic design expertise. This research was funded by NCSU and the NC Ag Foundation.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed
References
- 1.Doron S, Melamed S, Ofir G, et al. Systematic discovery of antiphage defense systems in the microbial pangenome. Science. 2018. January 25 DOI: 10.1126/science.aar4120 PubMed PMID: 29371424. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Samson JE, Magadan AH, Sabri M, et al. Revenge of the phages: defeating bacterial defences. Nat Rev Microbiol. 2013. October;11(10):675–687. PubMed PMID: 23979432. [DOI] [PubMed] [Google Scholar]
- 3.Makarova KS, Wolf YI, Koonin EV.. Comparative genomics of defense systems in archaea and bacteria. Nucleic Acids Res. 2013. April;41(8):4360–4377. PubMed PMID: 23470997; PubMed Central; PMCID: PMCPMC3632139. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Labrie SJ, Samson JE, Moineau S. Bacteriophage resistance mechanisms. Nat Rev Microbiol. 2010. May;8(5):317–327. PubMed PMID: 20348932. [DOI] [PubMed] [Google Scholar]
- 5.Deveau H, Barrangou R, Garneau JE, et al. Phage response to CRISPR-encoded resistance in Streptococcus thermophilus. J Bacteriol. 2008. February;190(4):1390–1400. PubMed PMID: 18065545. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Mojica FJ, Diez-Villasenor C, Garcia-Martinez J, et al. Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elements. J Mol Evol. 2005. February;60(2):174–182. PubMed PMID: 15791728. [DOI] [PubMed] [Google Scholar]
- 7.Barrangou R, Fremaux C, Deveau H, et al. CRISPR provides acquired resistance against viruses in prokaryotes. Science. 2007. March 23;315(5819):1709–1712. PubMed PMID: 17379808. [DOI] [PubMed] [Google Scholar]
- 8.Bolotin A, Quinquis B, Sorokin A, et al. Clustered regularly interspaced short palindrome repeats (CRISPRs) have spacers of extrachromosomal origin. Microbiology. 2005. August;151(Pt 8):2551–2561. PubMed PMID: 16079334. [DOI] [PubMed] [Google Scholar]
- 9.Pourcel C, Salvignol G, Vergnaud G. CRISPR elements in Yersinia pestis acquire new repeats by preferential uptake of bacteriophage DNA, and provide additional tools for evolutionary studies. Microbiology. 2005. March;151(Pt 3):653–663. PubMed PMID: 15758212. [DOI] [PubMed] [Google Scholar]
- 10.Barrangou R, Coute-Monvoisin AC, Stahl B, et al. Genomic impact of CRISPR immunization against bacteriophages. Biochem Soc Trans. 2013. December;41(6):1383–1391. [DOI] [PubMed] [Google Scholar]
- 11.Paez-Espino D, Morovic W, Sun CL, et al. Strong bias in the bacterial CRISPR elements that confer immunity to phage. Nat Commun. 2013;4:1430. [DOI] [PubMed] [Google Scholar]
- 12.Datsenko KA, Pougach K, Tikhonov A, et al. Molecular memory of prior infections activates the CRISPR/Cas adaptive bacterial immunity system. Nat Commun. 2012. July 10;(3):945 PubMed PMID: 22781758. [DOI] [PubMed] [Google Scholar]
- 13.Swarts DC, Mosterd C, van Passel MW, et al. CRISPR interference directs strand specific spacer acquisition. PLoS One. 2012;7(4):e35888 PubMed PMID: 22558257; PubMed Central PMCID: PMCPMC3338789. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Horvath P, Romero DA, Coute-Monvoisin AC, et al. Diversity, activity, and evolution of CRISPR loci in Streptococcus thermophilus. J Bacteriol. 2008. February;190(4):1401–1412. PubMed PMID: 18065539; PubMed Central; PMCID: PMCPMC2238196. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Hidalgo-Cantabrana C, Ab C, Sanchez B, et al. Characterization and Exploitation of CRISPR Loci in Bifidobacterium longum. Front Microbiol. 2017;8:1851 PubMed PMID: 29033911; PubMed Central; PMCID: PMCPMC5626976. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Briner AE, Barrangou R. Lactobacillus buchneri genotyping on the basis of clustered regularly interspaced short palindromic repeat (CRISPR) locus diversity. Appl Environ Microbiol. 2014. February;80(3):994–1001. PubMed PMID: 24271175; PubMed Central; PMCID: PMCPMC3911191. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Mokrousov I, Rastogi N. Spacer-Based Macroarrays for CRISPR Genotyping. Methods Mol Biol. 2015;1311:111–131. PubMed PMID: 25981469. [DOI] [PubMed] [Google Scholar]
- 18.Zhang J, Abadia E, Refregier G, et al. Mycobacterium tuberculosis complex CRISPR genotyping: improving efficiency, throughput and discriminative power of ‘spoligotyping’ with new spacers and a microbead-based hybridization assay. J Med Microbiol. 2010. March;59(Pt 3):285–294. PubMed PMID: 19959631. [DOI] [PubMed] [Google Scholar]
- 19.Fabre L, Zhang J, Guigon G, et al. CRISPR typing and subtyping for improved laboratory surveillance of Salmonella infections. PLoS One. 2012;7(5):e36995 PubMed PMID: 22623967; PubMed Central; PMCID: PMCPMC3356390. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Li Q, Wang X, Yin K, et al. Genetic analysis and CRISPR typing of Salmonella enterica serovar Enteritidis from different sources revealed potential transmission from poultry and pig to human. Int J Food Microbiol. 2018. February 2;(266):119–125. PubMed PMID: 29212058. [DOI] [PubMed] [Google Scholar]
- 21.Andersen JM, Shoup M, Robinson C, et al. CRISPR Diversity and microevolution in clostridium difficile. Genome Biol Evol. 2016. September 19;8(9):2841–2855. PubMed PMID: 27576538; PubMed Central; PMCID: PMCPMC5630864. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Delannoy S, Beutin L, Fach P. Improved traceability of Shiga-toxin-producing Escherichia coli using CRISPRs for detection and typing. Environ Sci Pollut Res Int. 2016. May;23(9):8163–8174. PubMed PMID: 26449676. [DOI] [PubMed] [Google Scholar]
- 23.Barrangou R, Dudley EG. CRISPR-Based typing and next-generation tracking technologies. Annu Rev Food Sci Technol. 2016;7:395–411. PubMed PMID: 26772411. [DOI] [PubMed] [Google Scholar]
- 24.Westra ER, Buckling A, Fineran PC. CRISPR-Cas systems: beyond adaptive immunity. Nat Rev Microbiol. 2014. May;12(5):317–326. PubMed PMID: 24704746. [DOI] [PubMed] [Google Scholar]
- 25.Biswas A, Staals RH, Morales SE, et al. CRISPRDetect: A flexible algorithm to define CRISPR arrays. BMC Genomics. 2016. May 17;17:356 PubMed PMID: 27184979; PubMed Central; PMCID: PMCPMC4869251. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Bland C, Ramsey TL, Sabree F, et al. CRISPR recognition tool (CRT): a tool for automatic detection of clustered regularly interspaced palindromic repeats. BMC Bioinformatics. 2007. June 18;(8):209 PubMed PMID: 17577412; PubMed Central; PMCID: PMCPMC1924867. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Grissa I, Vergnaud G, Pourcel C. CRISPRFinder: a web tool to identify clustered regularly interspaced short palindromic repeats. Nucleic Acids Res. 2007. July;35(Web Server issue):W52–7. PubMed PMID: 17537822; PubMed Central; PMCID: PMCPMC1933234. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Grissa I, Vergnaud G, Pourcel C. The CRISPRdb database and tools to display CRISPRs and to generate dictionaries of spacers and repeats. BMC Bioinformatics. 2007. May 23;8:172 PubMed PMID: 17521438; PubMed Central; PMCID: PMCPMC1892036. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Makarova KS, Zhang F, Koonin EV. SnapShot: class 1 CRISPR-Cas Systems. Cell. 2017. February 23;168(5):946–946 e1. PubMed PMID: 28235204. [DOI] [PubMed] [Google Scholar]
- 30.Makarova KS, Zhang F, Koonin EV. SnapShot: class 2 CRISPR-Cas Systems. Cell. 2017. January 12;168(1–2):328–328 e1. PubMed PMID: 28086097. [DOI] [PubMed] [Google Scholar]
- 31.Makarova KS, Wolf YI, Alkhnbashi OS, et al. An updated evolutionary classification of CRISPR-Cas systems. Nat Rev Microbiol. 2015. November;13(11):722–736. PubMed PMID: 26411297; PubMed Central; PMCID: PMCPMC5426118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Needleman SB, Wunsch CD. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol. 1970. March;48(3):443–453. PubMed PMID: 5420325. [DOI] [PubMed] [Google Scholar]
- 33.SRaM C. A statistical method for evaluating systematic relationships. Univ Kans Sci Bull. 1958;38:1409–1438. [Google Scholar]
- 34.Thompson JD, Plewniak F, Poch O. A comprehensive comparison of multiple sequence alignment programs. Nucleic Acids Res. 1999. July 1;27(13):2682–2690. PubMed PMID: 10373585; PubMed Central; PMCID: PMCPMC148477. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Koonin EV, Makarova KS, Zhang F. Diversity, classification and evolution of CRISPR-Cas systems. Curr Opin Microbiol. 2017. June;37:67–78. PubMed PMID: 28605718; PubMed Central; PMCID: PMCPMC5776717. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.van Der Oost J, Westra ER, Jackson RN, et al. Unravelling the structural and mechanistic basis of CRISPR-Cas systems. Nat Rev Microbiol. 2014. July;12(7):479–492. PubMed PMID: 24909109; PubMed Central; PMCID: PMCPMC4225775. [DOI] [PMC free article] [PubMed] [Google Scholar]
