Skip to main content
Nucleic Acids Research logoLink to Nucleic Acids Research
. 2020 Oct 13;49(D1):D871–D876. doi: 10.1093/nar/gkaa848

dbGuide: a database of functionally validated guide RNAs for genome editing in human and mouse cells

Alexander A Gooden 1,2, Christine N Evans 2,2, Timothy P Sheets 3,2, Michelle E Clapp 4, Raj Chari 5,
PMCID: PMC7779039  PMID: 33051688

Abstract

With the technology's accessibility and ease of use, CRISPR has been employed widely in many different organisms and experimental settings. As a result, thousands of publications have used CRISPR to make specific genetic perturbations, establishing in itself a resource of validated guide RNA sequences. While numerous computational tools to assist in the design and identification of candidate guide RNAs exist, these are still just at best predictions and generally, researchers inevitably will test multiple sequences for functional activity. Here, we present dbGuide (https://sgrnascorer.cancer.gov/dbguide), a database of functionally validated guide RNA sequences for CRISPR/Cas9-based knockout in human and mouse. Our database not only contains computationally determined candidate guide RNA sequences, but of even greater value, over 4000 sequences which have been functionally validated either through direct amplicon sequencing or manual curation of literature from over 1000 publications. Finally, our established framework will allow for continual addition of newly published and experimentally validated guide RNA sequences for CRISPR/Cas9-based knockout as well as incorporation of sequences from different gene editing systems, additional species and other types of site-specific functionalities such as base editing, gene activation, repression and epigenetic modification.

INTRODUCTION

CRISPR/Cas9-based genome editing has been an indispensable technology for understanding the biology of living organisms (1,2). As a result, a tremendous effort has been invested in aiding researchers in designing and executing CRISPR-based experiments by providing critical resources such as reagents and protocols as well as software tools that allow a user to rapidly identify guide RNA sequences with specific predicted on-target and off-target characteristics (3–18) (Table 1). However, these sequences are essentially predictions, often requiring assessment of multiple candidate guide RNAs.

Table 1.

List of sgRNA design tools

Name URL Species Design? Validated guides
CCTop (12) https://crispr.cos.uni-heidelberg.de/ Many Yes No
CHOPCHOP (6) http://chopchop.cbu.uib.no/ Many Yes No
CRISPOR (3) http://crispor.tefor.net/ Many Yes No
CRISPRseek (16) http://bioconductor.org/packages/release/bioc/html/CRISPRseek.html Many Yes No
CRISPRz (13) https://research.nhgri.nih.gov/CRISPRz/ Zebrafish No Yes
CRISPR-ERA (7) http://crispr-era.stanford.edu/index.jsp Many Yes No
CRISPR MultiTargeter (11) http://www.multicrispr.net/ Many No No
Cas-Designer (8) http://www.rgenome.net/cas-designer/ Many Yes No
DeepHF (14) http://www.deephf.com/index/#/Predict Human No No
E-CRISP (4) http://www.e-crisp.org/E-CRISP/ Many Yes No
EuPaGDT (9) http://grna.ctegd.uga.edu/ Many Yes No
inDelphi (18) https://indelphi.giffordlab.mit.edu/ Human, Mouse Yes No
sgRNAcas9 (15) http://www.biootools.com/software.html Many Yes No
sgRNA Designer (17) https://portals.broadinstitute.org/gpp/public/analysis-tools/sgrna-design Many No No
sgRNA Scorer 2.0 (32) https://sgrnascorer.cancer.gov/ Many Yes No
WU-CRISPR (5) http://crispr.wustl.edu/ Many Yes No
COMMERCIAL
Addgene https://www.addgene.org/crispr/reference/grna-sequence/ Many No Yes
CRISPR gRNA Design Tool (Atum) https://www.atum.bio/eCommerce/cas9/input Many Yes No
Benchling www.benchling.com Many Yes No
CRISPR Design Tool (Horizon Discovery) https://horizondiscovery.com/en/products/tools/CRISPR-Design-Tool Many No No
CRISPR sgRNA Design Tool (GenScript) https://www.genscript.com/gRNA-design-tool.html Many No No
CRISPR Cas9 Guide RNA Checker (IDT DNA) https://www.idtdna.com/site/order/designtool/index/CRISPR_SEQUENCE Many Yes No
True Designer Genome Editior (ThermoFisher Scientific) https://www.thermofisher.com/us/en/home/life-science/genome-editing/geneart-crispr/geneart-crispr-search-and-design-tool.html Many Yes No

Over the last 7 years, there has been an explosion of literature utilizing CRISPR/Cas9 to knockout specific genes or genomic regions, of which the majority is in human or mouse model systems. The culmination of all of this work has provided an opportunity to mine this information for guide RNA sequences shown to be functional at either a genotypic and/or phenotypic level. Variation in reporting standards, typographical errors and relegation of sequences into supplementary materials are among a few reasons why curation of this information has been non-trivial.

To this end, we present dbGuide, a database of functionally validated guide RNA sequences for CRISPR/Cas9 mediated knockout experiments in human or mouse cells (http://sgrnascorer.cancer.gov/dbguide). We have manually curated guide RNA sequences from over 1000 peer-reviewed articles with each sequence having direct reference to the original publication. We are also making results of targeted amplicon sequencing for ∼2000 unique sgRNA sequences tested individually in human (293T) or mouse (NIH-3T3 or P19) cultured cells publicly searchable in our database. In total, these efforts encompass nearly 6000 unique guide RNA sequences for which some level of validated activity exists. To our knowledge, this represents the largest database of functionally validated guide RNA sequences. We expect this to be a continually growing resource through multiple mechanisms. We have provided a downloadable template to encourage researchers to submit their newly published/validated sequences and our computational framework will also allow for the inclusion of sequences for CRISPR systems other than for Streptococcus pyogenes, modalities such as base editing, gene activation/repression and epigenetic modifications, and sequences used in other species.

MATERIALS AND METHODS

Master list of human and mouse sgRNA sequences

In addition to functionally validated sgRNA sequences, in order to provide further utility to the database, computationally designed sequences were also obtained from a variety of sources, primarily focused on protein-coding genes (17,19–26). These sources are listed in Supplementary Table S1. CSV-delimited files containing all sgRNA sequences present in our database for human (https://sgrnascorer.cancer.gov/downloads/hg_guide_info.csv.gz) and mouse (https://sgrnascorer.cancer.gov/downloads/mm_guide_info.csv.gz) can be downloaded directly from the dbGuide home page.

Published sgRNA sequences

A broad search of the PubMed database for ‘CRISPR OR Cas9’ was performed and yielded over 15 000 indexed citations. Subsequently, review articles, publications not using human or mouse cells, publications not using S. pyogenes Cas9 and publications not performing knock-out experiments were excluded. In total, guide RNA sequences were sourced from a total of 1322 peer-reviewed articles (Supplementary Table S2) from which the guide RNA sequences used could be ascertained. Although it is likely some published sequences have been missed, a data template is provided for which these missed sequences can be submitted and incorporated into subsequent database updates.

Targeted amplicon sequencing data

Quantitative editing data for nearly 2000 sgRNA sequences were generated internally from either transfection of Cas9/sgRNA plasmids (1 µg) or Cas9 protein (4 µg)/in vitro transcribed (IVT) (2.25 µg) RNA into mouse (NIH-3T3 or P19) or human (HEK293T) cells in a 24-well format. Cas9 protein was produced using plasmid Addgene-62731, a gift from Niels Geijsen (Addgene plasmid # 62731; http://n2t.net/addgene:62731; RRID:Addgene_62731) (27). IVT RNA was produced using a similar protocol as previously published (28). Cas9/RNP complexing and transfection using Lipofectamine 2000 were also performed similarly to as previously described (29). Polymerase chain reaction from genomic DNA was performed and amplicons were sequenced using the Illumina MiSeq V2 300 cycle kit using the PE (2 × 150) format. List of all sgRNAs sequences tested are listed in Supplementary Table S3.

Mapping information and on/off-target scoring metrics

For all sgRNA sequences obtained, genomic locations of the corresponding target sites were obtained/verified using UCSC BLAT against either the hg38 or mm10 reference genome sequences downloaded from the UCSC Genome Browser (30,31). Subsequently, sgRNA locations were cross-referenced with the Gencode V32 (human) or Gencode VM23 (mouse) gene/transcript annotations to determine which transcript(s) each sgRNA could target.

For on-target metrics, sgRNA Scorer 2.0 (32), Rule Set 2 (17) and FORECasT (33) scores were downloaded/calculated for each sgRNA and if a score could not be obtained, a value of ‘NV’ was denoted. For off-target analysis, Guidescan 1.0 (34) values were generated for each guide and similarly, for those guides for which a score could not be obtained, a value of ‘NV’ was given.

Analysis of targeted amplicon sequencing data

Paired end raw FASTQ files were merged using FLASH (35), filtered for low quality bases, subsequently mapped to the designated genomic locations in hg38/mm10 using bwa mem, and then sorted and indexed bam files were generated using samtools (36). A custom python snakemake (37) pipeline was made to calculate non-homologous end joining (NHEJ) mutation frequencies. This analysis pipeline is publicly available at https://github.com/rajchari2/ngs_amplicon_analysis.

RESULTS AND DISCUSSION

dbGuide uses a simple HTML interface which utilizes the datatables and highcharts javascript libraries for displaying data in tabular and graphical formats, respectively (Figure 1A). The application is built in python using django with a MySQL database used for data storage and retrieval. The database can be accessed without e-mail registration or login.

Figure 1.

Figure 1.

Structure of the dbGuide database. (A) Components of the dbGuide database. The user interface was developed using html and javascript and the application is managed on an Apache web server using django. All underlying data is stored in a MariaDB (MySQL) database. (B) Schema of the MySQL database. Each species, currently limited to human and mouse, has a table of sgRNA sequences with pre-computed metrics and a table with gene annotation information. sgRNA sequences obtained from publications are stored in a single table and the ‘species’ field is used to determine which species the guide RNA was used. Similarly, sequences from targeted amplicon sequencing data also have the ‘species’ field for this purpose.

Within the MySQL relational database, for both human and mouse, there was central sgRNA table with genome target position and all metrics pre-calculated and a gene annotation table which has the location of all protein coding genes based on Gencode annotation. Finally, there are two separate tables for summarizing the amplicon sequencing data and the publication-validated sequences. A depiction of the schema is shown in Figure 1B.

Opening user interface

The introductory user interface is very simple (Figure 2A). The user first specifies whether to search in the human or mouse genome and then can provide either a chromosomal position (in BED format), gene symbol, Ensembl gene or transcript ID, or an sgRNA spacer sequence (without the PAM). In addition, a link to an excel spreadsheet template is provided for researchers wishing to contribute to the database.

Figure 2.

Figure 2.

Screen shots of the key user interfaces. (A) Opening window of the dbGuide database. The user must first select whether to search within the human/mouse genome and then can specify a HGNC gene symbol, chromosomal coordinate in BED format, or ENSEMBL gene/transcript ID. (B) Window depicting results of a search by gene symbol. Selectable rows of sgRNA sequences are returned with various pre-computed metrics and PubMed identifiers, if the sequence had been used in a peer-reviewed publication. (C) Graphical representation of sgRNA sequences for which targeted amplicon sequencing data were generated. Stacked bar plots display the percentage of NHEJ mutation events that results in an ‘in-frame’ (black) or ‘out of frame’ (blue) amino acid change.

Search results when searching by gene/coordinate/accession

Upon entering a valid search term, the user will be shown a table of results detailing all of the guide RNA sequences identified based on the search criteria (Figure 2B). The table displays the following columns:

Gene-Symbol: most recent official gene symbol for a gene,

guide_rna: sgRNA spacer sequence (without PAM),

crispr_system: Cas9 ortholog, currently restricted to SpCas9,

position: genomic location of the sgRNA target site in either human (hg38) or mouse (mm10) assembly

in_protein_coding_exon: whether the guide RNA targets a protein coding exon,

num_transcripts: the number of GENCODE transcripts targeted,

transcript_id_list: list of GENCODE transcripts (by ENSEMBL ID),

sgrnascorer: predicted activity using the sgRNA Scorer 2.0 algorithm (−3 to 3). Higher the value, greater the predicted activity,

rule_set_2: predicted activity using the Rule Set 2 algorithm (0–1). Higher the value, greater the predicted activity

guide_scan_off: guidescan off-target score (0–1). Higher the value, higher the specificity

forecast: Favored Outcomes of Repair Events at Cas9 targets (FORECasT) score for predicting mutational outcomes (in frame indel %, lower values preferred for knockouts),

total_nhej_rate: if amplicon sequencing data exists, value between 0 and 100, or ‘None’ otherwise

oof_rate: out of frame mutation rate which is the proportion of mutated reads which would lead to an out of frame mutation

num_publications: number of published articles using the selected guide RNA sequence

ind_pmids: list of articles by PubMed ID which used selected guide RNA sequence

num_screens: number of pooled CRISPR screens in which the guide RNA sequences was enriched or depleted

screen_pmids: list of articles by PubMed ID of CRISPR-based pooled screens in which this guide was enriched or depleted

sources: list of sources from which the guide RNA sequence has been curated from

status

status: if the sgRNA sequence was functionally validated in a publication or targeted amplicon sequencing data exists, it is termed ‘validated’. Otherwise, it is termed ‘design’. This field allows the user to rapidly filter the list of returned sequences by typing ‘validate’ in the upper-right search box.

Viewing sequence editing data

In addition to sorting by total NHEJ rate, one can view the data in a graphical format. By selecting sgRNA sequences for which data exists, the user can click the ‘View Data’ which will show a stacked bar plot breaking down the total NHEJ rate between ‘in frame’ and ‘out of frame’ percentage (Figure 2C). This can be specifically important for knockout experiments where a higher out of frame (OOF) mutation percentage gives higher probability of protein loss.

Downloading sequences for use in experiments

Upon identifying guide RNA sequences of interest, the user can export the sequences in a ‘ready to order’ format by clicking on the ‘Design’. A tab-delimited text file is generated listing oligonucleotides needed for either ligation into a plasmid or in vitro transcription as well as a PDF protocol describing what compatible vectors can be used and how they can be obtained. It is highly recommended to select and download multiple guide RNA sequences when performing knockout experiments to ensure any functional consequences observed may not be due to spurious off-target activity from a single guide RNA.

DATA AVAILABILITY

Sorted BAM files for amplicon sequencing data are available from NCBI under Project Identifier PRJNA664634.

Supplementary Material

gkaa848_Supplemental_Files

ACKNOWLEDGEMENTS

We would like to acknowledge Troy Taylor, William Gillette, Jane Jones and Dominic Esposito for kindly providing Cas9 protein. We would also like to acknowledge the Biomedical Informatics and Data Science (BIDS) and Frederick Research Computing Environment (FRCE) for providing assistance in web hosting and computational support.

Contributor Information

Alexander A Gooden, Laboratory Animal Sciences Program, Frederick National Lab for Cancer Research, Frederick, MD 21702, USA.

Christine N Evans, Laboratory Animal Sciences Program, Frederick National Lab for Cancer Research, Frederick, MD 21702, USA.

Timothy P Sheets, Laboratory Animal Sciences Program, Frederick National Lab for Cancer Research, Frederick, MD 21702, USA.

Michelle E Clapp, Laboratory Animal Sciences Program, Frederick National Lab for Cancer Research, Frederick, MD 21702, USA.

Raj Chari, Laboratory Animal Sciences Program, Frederick National Lab for Cancer Research, Frederick, MD 21702, USA.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online.

FUNDING

Frederick National Laboratory for Cancer Research's Technology and Training Fellowship (to T.P.S.). Funding for open access charge: Frederick National Laboratory for Cancer Research.

Conflict of interest statement. None declared.

REFERENCES

  • 1. Mali P., Yang L., Esvelt K.M., Aach J., Guell M., DiCarlo J.E., Norville J.E., Church G.M.. RNA-guided human genome engineering via Cas9. Science. 2013; 339:823–826. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Cong L., Ran F.A., Cox D., Lin S., Barretto R., Habib N., Hsu P.D., Wu X., Jiang W., Marraffini L.A. et al.. Multiplex genome engineering using CRISPR/Cas systems. Science. 2013; 339:819–823. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Haeussler M., Schönig K., Eckert H., Eschstruth A., Mianné J., Renaud J.-B., Schneider-Maunoury S., Shkumatava A., Teboul L., Kent J. et al.. Evaluation of off-target and on-target scoring algorithms and integration into the guide RNA selection tool CRISPOR. Genome Biol. 2016; 17:148. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Heigwer F., Kerr G., Boutros M.. E-CRISP: fast CRISPR target site identification. Nat. Methods. 2014; 11:122–123. [DOI] [PubMed] [Google Scholar]
  • 5. Hiranniramol K., Chen Y., Liu W., Wang X.. Generalizable sgRNA design for improved CRISPR/Cas9 editing efficiency. Bioinformatics. 2020; 36:2684–2689. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Labun K., Montague T.G., Krause M., Torres Cleuren Y.N., Tjeldnes H., Valen E.. CHOPCHOP v3: expanding the CRISPR web toolbox beyond genome editing. Nucleic Acids Res. 2019; 47:W171–W174. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Liu H., Wei Z., Dominguez A., Li Y., Wang X., Qi L.S.. CRISPR-ERA: a comprehensive design tool for CRISPR-mediated gene editing, repression and activation. Bioinformatics. 2015; 31:3676–3678. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Park J., Bae S., Kim J.-S.. Cas-Designer: a web-based tool for choice of CRISPR-Cas9 target sites. Bioinformatics. 2015; 31:4014–4016. [DOI] [PubMed] [Google Scholar]
  • 9. Peng D., Tarleton R.. EuPaGDT: a web tool tailored to design CRISPR guide RNAs for eukaryotic pathogens. Microb. Genomics. 2015; 1:e000033. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Pliatsika V., Rigoutsos I.. ‘Off-Spotter’: very fast and exhaustive enumeration of genomic lookalikes for designing CRISPR/Cas guide RNAs. Biol. Direct. 2015; 10:4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Prykhozhij S.V., Rajan V., Gaston D., Berman J.N.. CRISPR multitargeter: a web tool to find common and unique CRISPR single guide RNA targets in a set of similar sequences. PLoS One. 2015; 10:e0119372. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Stemmer M., Thumberger T., Del Sol Keyer M., Wittbrodt J., Mateo J.L.. CCTop: an intuitive, flexible and reliable CRISPR/Cas9 target prediction tool. PLoS One. 2015; 10:e0124633. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Varshney G.K., Zhang S., Pei W., Adomako-Ankomah A., Fohtung J., Schaffer K., Carrington B., Maskeri A., Slevin C., Wolfsberg T. et al.. CRISPRz: a database of zebrafish validated sgRNAs. Nucleic Acids Res. 2016; 44:D822–D826. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Wang D., Zhang C., Wang B., Li B., Wang Q., Liu D., Wang H., Zhou Y., Shi L., Lan F. et al.. Optimized CRISPR guide RNA design for two high-fidelity Cas9 variants by deep learning. Nat. Commun. 2019; 10:4284. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Xie S., Shen B., Zhang C., Huang X., Zhang Y.. sgRNAcas9: a software package for designing CRISPR sgRNA and evaluating potential off-target cleavage sites. PLoS One. 2014; 9:e100448. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Zhu L.J., Holmes B.R., Aronin N., Brodsky M.H.. CRISPRseek: a bioconductor package to identify target-specific guide RNAs for CRISPR-Cas9 genome-editing systems. PLoS One. 2014; 9:e108424. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Doench J.G., Fusi N., Sullender M., Hegde M., Vaimberg E.W., Donovan K.F., Smith I., Tothova Z., Wilen C., Orchard R. et al.. Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR-Cas9. Nat. Biotechnol. 2016; 34:184–191. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Shen M.W., Arbab M., Hsu J.Y., Worstell D., Culbertson S.J., Krabbe O., Cassa C.A., Liu D.R., Gifford D.K., Sherwood R.I.. Predictable and precise template-free CRISPR editing of pathogenic variants. Nature. 2018; 563:646–651. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Doench J.G., Hartenian E., Graham D.B., Tothova Z., Hegde M., Smith I., Sullender M., Ebert B.L., Xavier R.J., Root D.E.. Rational design of highly active sgRNAs for CRISPR-Cas9-mediated gene inactivation. Nat. Biotechnol. 2014; 32:1262–1267. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Sanjana N.E., Shalem O., Zhang F.. Improved vectors and genome-wide libraries for CRISPR screening. Nat. Methods. 2014; 11:783–784. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Hart T., Chandrashekhar M., Aregger M., Steinhart Z., Brown K.R., MacLeod G., Mis M., Zimmermann M., Fradet-Turcotte A., Sun S. et al.. High-resolution CRISPR screens reveal fitness genes and genotype-specific cancer liabilities. Cell. 2015; 163:1515–1526. [DOI] [PubMed] [Google Scholar]
  • 22. Chari R., Mali P., Moosburner M., Church G.M.. Unraveling CRISPR-Cas9 genome engineering parameters via a library-on-library approach. Nat. Methods. 2015; 12:823–826. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. Wang T., Yu H., Hughes N.W., Liu B., Kendirli A., Klein K., Chen W.W., Lander E.S., Sabatini D.M.. Gene essentiality profiling reveals gene networks and synthetic lethal interactions with oncogenic ras. Cell. 2017; 168:890–903. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. Wang T., Birsoy K., Hughes N.W., Krupczak K.M., Post Y., Wei J.J., Lander E.S., Sabatini D.M.. Identification and characterization of essential genes in the human genome. Science. 2015; 350:1096–1101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Tzelepis K., Koike-Yusa H., De Braekeleer E., Li Y., Metzakopian E., Dovey O.M., Mupo A., Grinkevich V., Li M., Mazan M. et al.. A CRISPR dropout screen identifies genetic vulnerabilities and therapeutic targets in acute myeloid leukemia. Cell Rep. 2016; 17:1193–1205. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Henriksson J., Chen X., Gomes T., Ullah U., Meyer K.B., Miragaia R., Duddy G., Pramanik J., Yusa K., Lahesmaa R. et al.. Genome-wide CRISPR screens in T helper cells reveal pervasive crosstalk between activation and differentiation. Cell. 2019; 176:882–896. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. D’Astolfo D.S., Pagliero R.J., Pras A., Karthaus W.R., Clevers H., Prasad V., Lebbink R.J., Rehmann H., Geijsen N.. Efficient intracellular delivery of native proteins. Cell. 2015; 161:674–690. [DOI] [PubMed] [Google Scholar]
  • 28. Richardson C.D., Ray G.J., Bray N.L., Corn J.E.. Non-homologous DNA increases gene disruption efficiency by altering DNA repair outcomes. Nat. Commun. 2016; 7:12463. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Kim K., Park S.W., Kim J.H., Lee S.H., Kim D., Koo T., Kim K.-E., Kim J.H., Kim J.-S.. Genome surgery using Cas9 ribonucleoproteins for the treatment of age-related macular degeneration. Genome Res. 2017; 27:419–426. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Kent W.J. BLAT–the BLAST-like alignment tool. Genome Res. 2002; 12:656–664. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Kent W.J., Sugnet C.W., Furey T.S., Roskin K.M., Pringle T.H., Zahler A.M., Haussler D.. The human genome browser at UCSC. Genome Res. 2002; 12:996–1006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. Chari R., Yeo N.C., Chavez A., Church G.M.. sgRNA Scorer 2.0: a species-independent model to predict CRISPR/Cas9 activity. ACS Synth. Biol. 2017; 6:902–904. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. Allen F., Crepaldi L., Alsinet C., Strong A.J., Kleshchevnikov V., De Angeli P., Páleníková P., Khodak A., Kiselev V., Kosicki M. et al.. Predicting the mutations generated by repair of Cas9-induced double-strand breaks. Nat. Biotechnol. 2018; 37:64–72. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34. Perez A.R., Pritykin Y., Vidigal J.A., Chhangawala S., Zamparo L., Leslie C.S., Ventura A.. GuideScan software for improved single and paired CRISPR guide RNA design. Nat. Biotechnol. 2017; 35:347–349. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35. Magoč T., Salzberg S.L.. FLASH: fast length adjustment of short reads to improve genome assemblies. Bioinformatics. 2011; 27:2957–2963. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36. Li H., Handsaker B., Wysoker A., Fennell T., Ruan J., Homer N., Marth G., Abecasis G., Durbin R. 1000 Genome Project Data Processing Subgroup . The sequence Alignment/Map format and SAMtools. Bioinformatics. 2009; 25:2078–2079. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. Köster J., Rahmann S.. Snakemake-a scalable bioinformatics workflow engine. Bioinformatics. 2018; 34:3600. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

gkaa848_Supplemental_Files

Data Availability Statement

Sorted BAM files for amplicon sequencing data are available from NCBI under Project Identifier PRJNA664634.


Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press

RESOURCES