Skip to main content
BioMed Research International logoLink to BioMed Research International
. 2014 Jun 26;2014:742482. doi: 10.1155/2014/742482

SSFinder: High Throughput CRISPR-Cas Target Sites Prediction Tool

Santosh Kumar Upadhyay 1,*, Shailesh Sharma 1,*
PMCID: PMC4095993  PMID: 25089276

Abstract

Clustered regularly interspaced short palindromic repeats (CRISPR) and CRISPR-associated protein (Cas) system facilitates targeted genome editing in organisms. Despite high demand of this system, finding a reliable tool for the determination of specific target sites in large genomic data remained challenging. Here, we report SSFinder, a python script to perform high throughput detection of specific target sites in large nucleotide datasets. The SSFinder is a user-friendly tool, compatible with Windows, Mac OS, and Linux operating systems, and freely available online.

1. Introduction

Genome editing is a very useful technology in the research areas related to the functional genomics. Programmable nucleases like zinc finger nucleases (ZFN) and transcription activator-like effector nucleases (TALEN) are well known tools for targeted genome editing [1, 2]. Similarly, clustered regularly interspaced short palindromic repeats (CRISPR) and CRISPR-associated protein (Cas) system enables genome engineering in animals and plants [35]. This is an RNA guided system, which consists of a short guide RNA (gRNA) and a Cas9 protein. The gRNA contains 20 nucleotides targeted DNA sequence (known as spacer) that binds to the complementary strand of the target DNA by base-pairing and 79 nucleotides conserved sequence to form a specific hairpin-like fold. The gRNA interacts with Cas9 protein and forms an active ribonucleoprotein complex. The complex binds to the target sequence by base-pairing and cleaves dsDNA at a specific position [6]. A short conserved motif “NGG” (known as a protospacer adjacent motif: PAM) at 3′ downstream of the target sequence is also reported as necessary for cleavage [7].

The CRISPR-Cas system is simple in design as compared to ZFNs and TALENs and highly effective in large and complex genome like 17 Gb allohexaploid wheat also [5]. Some off-target binding of the system is reported [8] which can be abandoned by selecting the highly specific 12 nucleotide seed sequences in 3′ region of the spacer [4, 9]. Besides dsDNA cleavage, the CRISPR-Cas system has been recently modified for nicking and repression of gene activity [10]. It is reported as an effective tool for activation of gene expression also [11]. Due to these novel features, this system is going to play very important role in genome engineering programs. Therefore, a dedicated tool for determination of specific CRISPR-Cas target sites is an utmost requirement for several research groups.

Although detection of CRISPR-Cas target site is simple, which needs direct analysis of DNA sequences for the presence of specific 23 nucleotides (including “NGG” PAM at 3′ end). A few tools like CRISPR Design (http://www.broadinstitute.org/mpg/crispr_design/, http://crispr.mit.edu/), CRISPR Target (http://bioanalysis.otago.ac.nz/CRISPRTarget/crispr_analysis.html) and ZiFiT Targeter (http://zifit.partners.org/ZiFiT/ChoiceMenu.aspx) are recently reported for the determination of target sites [12]; most of these tools are limited to the analysis of small number or size of sequences and cannot modify according to the users need. Further, these are web based tools that depend upon internet connectivity. Some of the tools like CRISPR design (http://crispr.mit.edu/) are limited to the model genomes only (See Supplementary Table 1 in Supplementary Material available online at http://dx.doi.org/10.1155/2014/742482). Therefore, a simple, easy to edit, and high throughput computational tool is required for the analysis of large datasets on the local machine.

Here, we present SSFinder, a freeware written in python to find specific CRISPR-Cas target sites in limited time on even a personal computer. It is an organism independent freeware available at https://code.google.com/p/ssfinder/ under MIT license.

2. Materials and Methods

The SSFinder is a Python script and can execute in Windows, Mac OS, and Linux operating systems. It is a low memory request tool, which enables the finding of specific CRISPR-Cas target sites. It can be installed on both personal computers and parallel computing system commonly known as “cluster” dedicated in genome research. It only needs a compatible version of Python (2.2 or higher version) installed in the system.

A flow chart showing algorithm of the SSFinder is given in Figure 1. The DNA sequence is first analyzed for the occurrence of 23 nucleotide segments (including “NGG” PAM at 3′ end). These segments are further screened for the presence of 12 nucleotide seed sequences, which are distinct in the input sequence data. To further simplify, selected sequences are again classified into four groups on the basis of the start and end nucleotides, which are (1) G/C N7S11 G/C, (2) G/C N7S11 A/T, (3) A/T N7S11 A/T, and (4) A/T N7S11 G/C (N denotes for any nucleotides and S for seed sequences).

Figure 1.

Figure 1

A flow chart showing algorithm of the SSFinder for CRISPR-Cas target site prediction.

To use SSFinder, users need to download the script or copy-paste the script in a notepad and save as “ssfinder.py” in the directory hosting the Python executable (C:\Python). A working directory containing input file of FASTA formatted nucleotide sequences is also required. Finally, SSFinder can be executed by using the following command line in Linux terminal.

  • $python SSFinder.py

For windows user, an overview for the execution of the SSFinder using command prompt is shown in Figure 2. User needs to provide an address of working directory, file name of input sequences, and desired output format (like  .xls or  .txt or  .csv) when asked by SSFinder. The output file will be saved automatically in the same directory with the file name “SSFinder-output.” In this file, SSFinder provides results in seven distinct columns consisted of (1) identifier of the sequences, (2) classification in four groups based on the start and end nucleotides, (3) potential target sites, including “NGG” PAM sequence, (4) start and (5) end position of target sites, (6) condition with specific 12 nucleotide seed sequences, and (7) specific CRISPR-Cas target sites (Table 1). Sequences from the 7th column can be directly used as spacers in the gRNA designing.

Figure 2.

Figure 2

An overview of command prompt for using SSFinder in Windows operating system.

Table 1.

An overview of output file given by SSFinder.

Identifier Classification CRISPR-target site with “NGG” PAM Position Condition with seed sequences Specific CRISPR-target site
Start End
Sequence_1 A or TN18A or T ACTTCTTCGTCCAACTTCTTCGG 6 28 A or TN7S11A or T ACTTCTTCGTCCAACTTCTT
Sequence_1 A or TN18G or C TTTGCAAGCCTCATCCATTGTGG 386 408 A or TN7S11G or C TTTGCAAGCCTCATCCATTG
Sequence_1 G or CN18A or T CCAAAGTTCTATTTGAGCTAAGG 68 90 G or CN7S11A or T CCAAAGTTCTATTTGAGCTA
Sequence_1 G or CN18G or C CTAACCGACCTTCAGCTAACAGG 154 176 G or CN7S11G or C CTAACCGACCTTCAGCTAAC

3. Results and Discussion

The SSFinder facilitates prediction of CRISPR-Cas target sites in small as well as large genomic data. It is freeware for the researchers and can also be used on personal computers of any configuration. Compatibility with operating systems like Windows, Mac OS, and Linux makes this tool user-friendly for the researcher from nonbioinformatics background as well.

The SSFinder scans a DNA sequence by moving a window of 23 nucleotides at the step size of one nucleotide. Slices with 3′ “NGG” PAM sequence are selected and analyzed for the presence of 12 nucleotide seed sequences. Since the seed sequence determines the specificity of the CRISPR-Cas system [4, 5, 9], the tool ensures that these sequences are not repeated in the entire input genome data [13]. Sequences having distinct seed sequence in entire input sequence file are displayed in output file as specific target sites. Sometimes researchers need to target the genomic region start and end with specific nucleotide (like A/T/G/C). Therefore, the selected slices are further classified into four different motifs to ease the process.

To test the performance of SSFinder, we used two types of large datasets: (1) multiple sequences with cumulative big size (27,416 Arabidopsis thaliana protein coding genes of ~37 million bases from TAIR10, ftp://ftp.arabidopsis.org/home/tair/Sequences/blast_datasets/TAIR10_blastsets/) and (2) single sequence of large size (X chromosome of ~22 million bases of Drosophila melanogaster, accession number NC_004354.3). In the first dataset, a total of 1780846 sites were detected in 27416 genes, in which 1618640 sites in 27239 genes were specific for the CRISPR-Cas binding. The result was in agreement with the earlier report [9]. In case of second dataset, a total of 1058944 sites were detected, in which 977257 were specific. The analyzed data can be available on request. The average speed of the analysis was 34 and 40 kilo bases per minute for first and second dataset, respectively. The average distance between two specific target sites was 22.9 nucleotides in both tested organisms. The results indicated that the SSFinder is a high throughput tool for CRISPR-Cas target site prediction from large datasets in limited time.

4. Conclusion

We report SSFinder, a comprehensive tool for the identification of specific CRISPR-Cas target sites with high reliability. It is a freeware, easy to edit, and low memory demand tool compatible with many commonly used operating systems. Our tool is very useful in high throughput inhouse screening applications of large genomes in limited time. This can accelerate the functional genomics research based on the application of CRISPR-Cas system.

Supplementary Material

We compared the efficiency, excess and flexibility of SSFinder with other reported tools. We found SSFinder is more user friendly and exhaustive tool than others.

742482.f1.pdf (20.1KB, pdf)

Acknowledgments

Authors are thankful to the National Agri-Food Biotechnology Institute (NABI), Mohali, India, for the research facility. Santosh Kumar Upadhyay is thankful to Department of Science and Technology (DST), Government of India, for DST INSPIRE Faculty Fellowship.

Availability

The SSFinder is a python script, freely available at https://code.google.com/p/ssfinder/.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Authors' Contribution

Santosh Kumar Upadhyay conceived the idea, designed the experiment, analyzed the tool, and wrote the paper. Shailesh Sharma developed the script and performed the experiment. Santosh Kumar Upadhyay and Shailesh Sharma contributed equally to this work.

References

  • 1.Chen K, Gao C. TALENs: customizable molecular DNA scissors for genome engineering of plants. Journal of Genetics and Genomics. 2013;40(6):271–279. doi: 10.1016/j.jgg.2013.03.009. [DOI] [PubMed] [Google Scholar]
  • 2.Zhang F, Maeder ML, Unger-Wallaced E, et al. High frequency targeted mutagenesis in Arabidopsis thaliana using zinc finger nucleases. Proceedings of the National Academy of Sciences of the United States of America. 2010;107(26):12028–12033. doi: 10.1073/pnas.0914991107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Cong L, Ran FA, Cox D, et al. Multiplex genome engineering using CRISPR/Cas systems. Science. 2013;339(6121):819–823. doi: 10.1126/science.1231143. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Mali P, Yang L, Esvelt KM, et al. RNA-guided human genome engineering via Cas9. Science. 2013;339(6121):823–826. doi: 10.1126/science.1232033. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Upadhyay SK, Kumar J, Alok A, Tuli R. RNA guided genome editing for multiple target gene mutations in wheat. G3: Genes, Genomes, Genetics. 2013;3(12):2233–2238. doi: 10.1534/g3.113.008847. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Jore MM, Lundgren M, van Duijn E, et al. Structural basis for CRISPR RNA-guided DNA recognition by Cascade. Nature Structural & Molecular Biology. 2011;18(5):529–536. doi: 10.1038/nsmb.2019. [DOI] [PubMed] [Google Scholar]
  • 7.Gasiunas G, Barrangou R, Horvath P, Siksnys V. Cas9-crRNA ribonucleoprotein complex mediates specific DNA cleavage for adaptive immunity in bacteria. Proceedings of the National Academy of Sciences of the United States of America. 2012;109(39):E2579–E2586. doi: 10.1073/pnas.1208507109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Fu Y, Foden JA, Khayter C, et al. High-frequency off-target mutagenesis induced by CRISPR-Cas nucleases in human cells. Nature Biotechnology. 2013;31:822–826. doi: 10.1038/nbt.2623. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Li JF, Norville JE, Aach J, et al. Multiplex and homologous recombination-mediated genome editing in Arabidopsis and Nicotiana benthamiana using guide RNA and Cas9. Nature Biotechnolog. 2013;31(8):688–691. doi: 10.1038/nbt.2654. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Qi LS, Larson MH, Gilbert LA, et al. Repurposing CRISPR as an RNA-γuided platform for sequence-specific control of gene expression. Cell. 2013;152(5):1173–1183. doi: 10.1016/j.cell.2013.02.022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Bikard D, Jiang W, Samai P, Hochschild A, Zhang F, Marraffini LA. Programmable repression and activation of bacterial gene expression using an engineered CRISPR-Cas system. Nucleic Acids Research. 2013;41(15):7429–7437. doi: 10.1093/nar/gkt520. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Biswas A, Gagnon JN, Brouns SJJ, Fineran PC, Brown CM. CRISPRTarget: bioinformatic prediction and analysis of crRNA targets. RNA Biology. 2013;10(5):817–827. doi: 10.4161/rna.24046. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Jinek M, East A, Cheng A, Lin S, Ma E, Doudna J. RNA-programmed genome editing in human cells. eLife. 2013;2 doi: 10.7554/eLife.00471.e00471 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

We compared the efficiency, excess and flexibility of SSFinder with other reported tools. We found SSFinder is more user friendly and exhaustive tool than others.

742482.f1.pdf (20.1KB, pdf)

Articles from BioMed Research International are provided here courtesy of Wiley

RESOURCES