Skip to main content
. 2024 Jan 8;11:1335901. doi: 10.3389/fbioe.2023.1335901

TABLE 3.

Databases associated with genome editing research for the development of AI models.

Dataset name Data description Data link Type of editing Target Machine learning model used
CHANGE-seq data Lazzarotto et al. (2020), (publicly available) There are a total of 201,934 off-target sites scattered throughout the human genome https://github.com/tsailabSJ/changeseq CRISPR-Cas9 genome-editing Off-target GBT
DeepHf data Wang et al. (2019), (publicly available) For each nuclease, there are 50,000 gRNAs available, collectively targeting approximately 20,000 genes http://www.deephf.com/ CRISPR-Cas9 GED On-target RNN (Recurrent Neural Network), Bi-LSTM (bidirectional long short-term)
Abadi et al. (2017), (publicly available) This comprises 33 sets of sgRNAs, each associated with its specific targets https://doi.org/10.1371/journal.pcbi.1005807.s014 CRISPR-Cas9 GED Off-targets Random forest
Genome CRISPR database Rauscher et al. (2017), (publicly available) There is a total of 400,000 sgRNA sequences from the GenomeCRISPR project dataset. http://genomecrispr.org/ CRISPR-Cas9 GED On-targets CNN (named as DeepSgRNA)
GUIDE-seq data Tsai et al. (2015), (publicly available) Nucleases guided by RNA from two human cell lines, U2OS and HEK293, were examined at different sites https://github.com/tsailabSJ/guideseq CRISPR-Cas9 genome-editing Off-targets CRISTA (CRISPR Target Assessment using RF regression)
Arbab et al. (2020), (publicly available) Data from as many as 10,638 sgRNA-target pairs was randomly divided into partitions https://www.google.com/url?q=https://ars.els-cdn.com/content/image/1-s2.0-S0092867420306322-mmc5.csv&sa=D&source=docs&ust=1698141109170457&usg=AOvVaw1eJY32CwBjGjzLD64EITS8 BED On-target Be-Hive (autoregressive neural network)
Kim et al. (2023), (publicly available) Nine Cas9 variants https://www.ncbi.nlm.nih.gov/bioproject/PRJNA821929/ BED On and Off-target SVM, L1-regularized LR, L2-regularized LR, AdaBoost, and Random Forest
Pallaseni et al. (2022), (publicly available) Used the dataset from (Arbab et al., 2020; Song et al., 2020) https://www.ebi.ac.uk/ena/browser/home BED Off-target GBT
Li et al. (2022), Private 1134 target sequences https://www.ncbi.nlm.nih.gov/bioproject/PRJNA885770/ BED On-target XGBoost
Kim et al. (2021), (publicly available) There are 54,836 pairs consisting of pegRNAs and their corresponding target sequences https://github.com/julianeweller/MinsePIE PED On and Off-target DeepPE