TABLE 4.
ML and DL-based tools for Genome editing applications.
| Ref | ML models used | Dataset | Description and key contribution | Performance evaluation metrics | Limitation | Target |
|---|---|---|---|---|---|---|
| Chuai et al. (2018) | DeepCRISPR (DCDNN) | Thirteen distinct human cell lines produced a total of 0.68 billion sgRNA sequences | This computational framework surpasses existing in silico tools by combining sgRNA on-target/off-target site prediction into a single system with DL. | Spearman: 0.246, AUROC: 0.804, AUPRC: 0.303 | The model acquires an understanding of which attributes are crucial for improved sgRNA structure, even when trained with a limited number of samples | On and Off-target |
| Lin and Wong (2018) | CNN and FNN, Random Forest, GBTs, and LR | GUIDE-seq Tsai et al. (2015), CRISPOR Concordet and Haeussler (2018) | The key contribution of this paper is the development and implementation of a deep CNN for accurately predicting off-target mutations in CRISPR-Cas9 gene editing | AUROC: 97.2% for CNN, AUROC: 97% for FNN | — | Off-target |
| Xue et al. (2018) | DeepCas9 (1D CNN) | Wang Wang et al. (2014), Doench V1 Doench et al. (2014), Doench V2 Doench et al. (2016), C.elegans (F et al. (2015) HCT116 Hart et al. (2015), Z fish Gagnon et al. (2014); Moreno-Mateos et al. (2015); Varshney et al. (2015), Chari Chari et al. (2015), Haeussler Haeussler et al. (2016), HL-60 Xu et al. (2015) | It is the first DL technique that can recognize CRISPRCas9 sgRNA activity directly from genetic sequences without the need for feature input | Spearman: 0.23-0.61 | These datasets’ sgRNA activity was completely limited to clinical assays, where the measured cleavage efficiency served as a clear indicator of KO efficacy | On-target |
| Liu et al. (2019) | SeqCrispr (RNN + CNN + transfer learning) | DeepCRISPR Chuai et al. (2018), CRISPR-Cpf1 Zaidi et al. (2017) | SeqCrispr is a DL model, which integrates gene network features specific to a given context into the model | Spearman: 0.77 | The limited knowledge of gene activity and its fluctuating effects on phenotype, and the challenging biological interpretation of computational models all restrict the predictive model’s efficiency | On-target |
| Wang et al. (2019) | DeepHF (RNN) | With approximately 50,000 gRNAs, DeepHF is the biggest gRNA on-target activity set for cells from mammals | To create the final model, DeepHF extracts features using a Bi-LSTM and combines them with biological features that are manually created. Important sequence characteristics linked to gRNA activity were found in the study, which also assessed several ML algorithms for gRNA activity prediction | Spearman: 0.867, 0.862, and 0.860 | They were unable to determine which algorithm performed more effectively than others on endogenous sites because of the small amount of data available | On-target |
| Shrawgi and Sisodia (2019) | DeepSgRNA (CNN, with Hierarchical feature generation abilities) | 40,000 sgRNA sequence examples taken from the GenomeCRISPR project database | DeepSgRNA finds and forecasts RNA guides to improve performance. There is no need to create any features with the suggested model | Spearman: 0.82, AUROC: 0.85 | Specific sgRNA’s off-site effects have not been considered in this investigation | On-target |
| Wang and Zhang (2019) | CNN with 5layers + transfer learning | Cas9, eSpCas9, Cas9 (/\recA) Yue et al. (2020) | The main contribution of this paper is the development of a CNN_5 layers network for predicting sgRNA activity in prokaryotic and eukaryotic species. The model takes 43nt-long DNA sequences as input and predicts on-target activity | Spearman: 0.582, 0.7105, 0.360 | The limitation of this model is that it does not perform well in predicting the on-target activity for the Cas9 (/\recA) scenario | On-target |
| Aktas et al. (2019) | CNN, MLP, Bi-LSTM | DeepCRISPR Chuai et al. (2018) | In this work, sgRNA target estimate for CRISPR/CAS9 with DL was carried out to reduce these genomic aberrations | Accuracy: 96.7% | Some of the mistargeted positions caused unwanted genome distortions | Off-target and on-target |
| Kim et al. (2019) | DeepSpCas9 (3 1D-CNN) | DeepSpCas9 | It accurately predicted the activity of the SpCas9 enzyme | Spearman: 0.73 | The size of the training datasets was not ideal | On-target |
| Liu et al. (2020) | CnnCrispr (Bi-LSTM and CNN) | DeepCRISPR Chuai et al. (2018) | To forecast the off-target tendency of sgRNA at particular DNA fragments, CnnCrispr was proposed | AUROC: 0.957, AUPRC: 0.429 | RNNs are capable of implementing memory functions, but their capacity is restricted due to the possibility of gradient explosion or disappearance | Off-target |