Table 1.
iRF model summary and metrics
Model species | Source dataset | # of sgRNAs | Feature det | # of Features | Description | Test dpecies | R2 | Pearson correlation |
---|---|---|---|---|---|---|---|---|
E.coli | Guo et al. 2016 | 40 468 [32 374 train] | Raw | 5 | summary values of sgRNA sequence including GC content, Tm, MFE, gene density and distance to PAM | E. coli | 0.0406861 | 0.2007612 |
Onehot | 5911 | binary positional encoding of 20bp sgRNA nucleotide sequence | E. coli | 0.26004285 | 0.4914184 | |||
QCT | 316 | quantitative metrics for H-bond and HL-gap based on positional nucleotide sequence | E. coli | 0.24183122 | 0.4918057 | |||
Raw.Onehot | 6916 | Raw + Onehot | E. coli | 0.26028286 | 0.4931724 | |||
Raw.QCT | 312 | Raw + QCT | E. coli | 0.24177446 | 0.4939777 | |||
Onehot.QCT | 6227 | Onehot + QCT | E. coli | 0.24905183 | 0.500817 | |||
Full Matrix | 6232 | Raw + Onehot + QCT | E. coli | 0.24906667 | 0.5019173 | |||
H. sapien | 0.00429969 | 0.06557198 | ||||||
Top 5 | 5 | Based on the full feature matrix iRF model run with E.coli data, the top feature importance scores were utilized to generate new iRF modesl with 5,10,20,40100200500 and 1000 features. | E. coli | 0.11240746 | 0.3436711 | |||
Top 10 | 10 | E. coli | 0.15779734 | 0.4019815 | ||||
Top 20 | 20 | E. coli | 0.2017236 | 0.4458406 | ||||
Top 50 | 50 | E. coli | 0.24529071 | 0.4903894 | ||||
Top 100 | 100 | E. coli | 0.25119027 | 0.4967809 | ||||
Top 200 | 200 | |||||||
Top 500 | 500 | |||||||
Top 1000 | 1000 | |||||||
H.sapien | Doench et al. 2014 | 1278 [1022 train] | Full Matrix | 6172 | Raw + Onehot + QCT based on the H.sapien sgRNA sequence set from Doench et al. 2014 | H. sapien | 0.389120714 | 0.6525512 |
H.sapien | Chuai et al. 2018 | 16 749 [13 399 train] | Full Matrix | 6172 | Raw + Onehot + QCT based on the H.sapien sgRNA sequence set from Chuai et al. 2018 | H. sapien | 0.229489979 | 0.486193 |
H.sapien | Doench et al. 2014; Chuai et al. 2018 | 17 421 [13 936 train] | Full Matrix | 6172 | Raw + Onehot + QCT based on the H.sapien sgRNA sequence set from Doench et al. 2014 and Chuai et al. 2018 | H. sapien | 0.211671332 | 0.4964907 |
E.coli + H.sapien | Guo et al. 2016; Doench et al. 2014; Chuai et al. 2018 | 30000 [24000 train] | Full Matrix | 6172 | Raw + Onehot + QCT based on the E.coli sgRNA sequence set from Guo et al. 2016 and the H.sapien sgRNA sequence set sfrom Doench et al. 2014 and Chuai et al. 2018 | E. coli + H. sapien | 0.486194 | 0.6972761 [E.coli 0.504] [H.sapien 0.491] |