. 2024 Jan 8;11:1335901. doi: 10.3389/fbioe.2023.1335901

TABLE 4.

ML and DL-based tools for Genome editing applications.

Ref	ML models used	Dataset	Description and key contribution	Performance evaluation metrics	Limitation	Target
Chuai et al. (2018)	DeepCRISPR (DCDNN)	Thirteen distinct human cell lines produced a total of 0.68 billion sgRNA sequences	This computational framework surpasses existing in silico tools by combining sgRNA on-target/off-target site prediction into a single system with DL.	Spearman: 0.246, AUROC: 0.804, AUPRC: 0.303	The model acquires an understanding of which attributes are crucial for improved sgRNA structure, even when trained with a limited number of samples	On and Off-target
Lin and Wong (2018)	CNN and FNN, Random Forest, GBTs, and LR	GUIDE-seq Tsai et al. (2015), CRISPOR Concordet and Haeussler (2018)	The key contribution of this paper is the development and implementation of a deep CNN for accurately predicting off-target mutations in CRISPR-Cas9 gene editing	AUROC: 97.2% for CNN, AUROC: 97% for FNN	—	Off-target
Xue et al. (2018)	DeepCas9 (1D CNN)	Wang Wang et al. (2014), Doench V1 Doench et al. (2014), Doench V2 Doench et al. (2016), C.elegans (F et al. (2015) HCT116 Hart et al. (2015), Z fish Gagnon et al. (2014); Moreno-Mateos et al. (2015); Varshney et al. (2015), Chari Chari et al. (2015), Haeussler Haeussler et al. (2016), HL-60 Xu et al. (2015)	It is the first DL technique that can recognize CRISPRCas9 sgRNA activity directly from genetic sequences without the need for feature input	Spearman: 0.23-0.61	These datasets’ sgRNA activity was completely limited to clinical assays, where the measured cleavage efficiency served as a clear indicator of KO efficacy	On-target
Liu et al. (2019)	SeqCrispr (RNN + CNN + transfer learning)	DeepCRISPR Chuai et al. (2018), CRISPR-Cpf1 Zaidi et al. (2017)	SeqCrispr is a DL model, which integrates gene network features specific to a given context into the model	Spearman: 0.77	The limited knowledge of gene activity and its fluctuating effects on phenotype, and the challenging biological interpretation of computational models all restrict the predictive model’s efficiency	On-target
Wang et al. (2019)	DeepHF (RNN)	With approximately 50,000 gRNAs, DeepHF is the biggest gRNA on-target activity set for cells from mammals	To create the final model, DeepHF extracts features using a Bi-LSTM and combines them with biological features that are manually created. Important sequence characteristics linked to gRNA activity were found in the study, which also assessed several ML algorithms for gRNA activity prediction	Spearman: 0.867, 0.862, and 0.860	They were unable to determine which algorithm performed more effectively than others on endogenous sites because of the small amount of data available	On-target
Shrawgi and Sisodia (2019)	DeepSgRNA (CNN, with Hierarchical feature generation abilities)	40,000 sgRNA sequence examples taken from the GenomeCRISPR project database	DeepSgRNA finds and forecasts RNA guides to improve performance. There is no need to create any features with the suggested model	Spearman: 0.82, AUROC: 0.85	Specific sgRNA’s off-site effects have not been considered in this investigation	On-target
Wang and Zhang (2019)	CNN with 5layers + transfer learning	Cas9, eSpCas9, Cas9 (/\recA) Yue et al. (2020)	The main contribution of this paper is the development of a CNN_5 layers network for predicting sgRNA activity in prokaryotic and eukaryotic species. The model takes 43nt-long DNA sequences as input and predicts on-target activity	Spearman: 0.582, 0.7105, 0.360	The limitation of this model is that it does not perform well in predicting the on-target activity for the Cas9 (/\recA) scenario	On-target
Aktas et al. (2019)	CNN, MLP, Bi-LSTM	DeepCRISPR Chuai et al. (2018)	In this work, sgRNA target estimate for CRISPR/CAS9 with DL was carried out to reduce these genomic aberrations	Accuracy: 96.7%	Some of the mistargeted positions caused unwanted genome distortions	Off-target and on-target
Kim et al. (2019)	DeepSpCas9 (3 1D-CNN)	DeepSpCas9	It accurately predicted the activity of the SpCas9 enzyme	Spearman: 0.73	The size of the training datasets was not ideal	On-target
Liu et al. (2020)	CnnCrispr (Bi-LSTM and CNN)	DeepCRISPR Chuai et al. (2018)	To forecast the off-target tendency of sgRNA at particular DNA fragments, CnnCrispr was proposed	AUROC: 0.957, AUPRC: 0.429	RNNs are capable of implementing memory functions, but their capacity is restricted due to the possibility of gradient explosion or disappearance	Off-target