Table 2.
Year | Toola | Classifier | Training/Independent Dataset Size | Features | Web Server | Evaluation Strategy | File Upload |
---|---|---|---|---|---|---|---|
2017 | iDNA4mCb | SVM | Chen dataset/– | RFHCP | yes | LOOCV | no |
2018 | 4mCPredb | SVM | Chen dataset/– | PSTNP, EIIP | yes | LOOCV | no |
4mcPred-SVMb | SVM | Chen dataset/– | Kmer, MBE, DBE, LPDF | yes | 10-fold CV | yes | |
2019 | Meta-4mCpredb | RF, ERT, GB, SVM | Chen dataset/Manavalan dataset | Kmer, MBE, DPE, LPDF, RFHCP, DPCP, TPCP | yes | 10-fold CV | yes |
4mcPred-IFLb | SVM | Chen dataset/– | Kmer+MBE, DBE+LPDF, PCPs, PseDNC, KNN, EIIP, MMI, RFHCP | yes | 10-fold CV | yes | |
4mCCNNb | CNN | Chen dataset/– | MBE | yesc | 10-fold CV | – | |
4mCpred-ELd | RF, GB, ERT, SVM | (800 4mCs and 800 non-4mCs)/(180 4mCs and 180 non-4mCs) | Kmer, DPE+LPDF, RFHC, EIIP, MBE, DPCP, TPCP | yes | 10-fold CV | yes | |
i4mC-ROSEe | RF | (4854 4mCs and 4854 non-4mCs)/(1617 4mCs and 1617 non-4mCs) | KSNC, MBE, EIIP | yes | 10-fold CV | yes | |
2020 | iEC4mC-SVMd | SVM | (388 4mCs and 388 non-4mCs)/(134 4mCs and 134 non-4mCs) | MBE, RFHC, DAE, X-k-YCF, Kmer | – | 10-fold CV | – |
DNA4mC-LIPd | – | –/Manavalan dataset | integration of six existing predictors | yes | independent evaluation | yes | |
4mcDeep-CBId | CNN, BLSTM | (1,173 4mCs and 6,635 non-4mCs)/ – | same as used in 4mcPred-IFL | – | 3-fold CV | – | |
iDNA-MSf | RF | 7,899 samples/7,898 samples | Kmer, RFHCP, MBE | yes | 5-fold CV | yes | |
i4mC-Moused | RF | (746 4mCs and 746 non-4mCs)/(160 4mCs and 160 non-4mCs) | Kmer, KSNC, MBE, EIIP | yes | 10-fold CV | yes |
Chen dataset contains C. elegans (4mCs, 1,554; non-4mCs, 1,554), D. melanogaster (4mCs, 1,769; non-4mCs, 1,769), A. thaliana (4mCs, 1,978; non-4mCs, 1,978), E. coli (4mCs, 388; non-4mCs, 388), Geoa. subterraneus (4mCs, 906; non-4mCs, 906), and Geob. pickeringii (4mCs, 569; non-4mCs, 569). Manavalan dataset contains C. elegans (4mCs, 750; non-4mCs, 750), D. melanogaster (4mCs, 1,000; non-4mCs, 1,000), A. thaliana (4mCs, 1,250; non-4mCs, 1,250), E. coli (4mCs, 134; non-4mCs, 134), Geoa. subterraneus (4mCs, 350; non-4mCs, 350), and Geob. pickeringii (4mCs, 200; non-4mCs, 200). SVM, support vector machine; RF, random forest; GB, gradient boosting; CNN, convolutional neural network; BLSTM, bidirectional long short-term memory network; ERT, extremely randomized tree; RFHCP, ring-function-hydrogen-chemical properties, PSTNP, position-specific trinucleotide propensity; EIIP, electron-ion interaction pseudopotential; Kmer, Kmer nucleotide frequency; MBE, mononucleotide binary encoding, DBE, dinucleotide binary encoding, LPDF, local position-specific dinucleotide frequency; DPE, dinucleotide binary profile encoding; DPCP, dinucleotide physicochemical properties; TPCP, trinucleotide physicochemical properties; PCP, physicochemical property; PseDNC, pseudo-dinucleotide composition; KNN, K-nearest neighbor; KSNC, k-space nucleotide composition; DAC, dinucleotide physicochemical properties autocorrelation; X-k-YCF, Xmer-kGap-Ymer composition frequency; ANF, accumulated nucleotide frequency; LOOCV, leave-one-out cross-validation; CV, cross-validation.
The listed tool URL addresses are as follows: iDNA4mC, http://lin-group.cn/server/iDNA4mC/; 4mCPred, http://server.malab.cn/4mCPred/; 4mcPred-SVM, http://server.malab.cn/4mcPred-SVM/; Meta-4mCpred, http://thegleelab.org/Meta-4mCpred/; 4mcPred-IFL, http://server.malab.cn/4mcPred-IFL/; 4mCCNN, https://home.jbnu.ac.kr/NSCL/4mCCNN.htm; 4mCpred-EL, http://thegleelab.org/4mCpred-EL/; i4mC-ROSE, http://kurata14.bio.kyutech.ac.jp/i4mC-ROSE/; DNA4mC-LIP, http://i.uestc.edu.cn/DNA4mC-LIP/; iDNA-MS, http://lin-group.cn/server/iDNA-MS/; i4mC-Mouse, http://kurata14.bio.kyutech.ac.jp/i4mC-Mouse/.
Tools contain six species-specific prediction models, namely A. thaliana, C. elegans, D. melanogaster, E. coli, Geoa. subterraneus, and Geob. pickeringii.
Web server is not functional.
Tool contains one prediction model to compute 4mC site from specific species.
Tool contains two prediction models for F. vesca and Rosa chinensis.
Tool contains four different species-specific models, namely F. vesca, Casuarina equisetifolia, Saccharomyces cerevisiae, and Ts. SUP5-1.