Skip to main content
. 2020 Sep 16;22:406–420. doi: 10.1016/j.omtn.2020.09.010

Table 2.

List of Currently Available Tools for 4mC Sites Prediction Assessed in This Study

Year Toola Classifier Training/Independent Dataset Size Features Web Server Evaluation Strategy File Upload
2017 iDNA4mCb SVM Chen dataset/– RFHCP yes LOOCV no
2018 4mCPredb SVM Chen dataset/– PSTNP, EIIP yes LOOCV no
4mcPred-SVMb SVM Chen dataset/– Kmer, MBE, DBE, LPDF yes 10-fold CV yes
2019 Meta-4mCpredb RF, ERT, GB, SVM Chen dataset/Manavalan dataset Kmer, MBE, DPE, LPDF, RFHCP, DPCP, TPCP yes 10-fold CV yes
4mcPred-IFLb SVM Chen dataset/– Kmer+MBE, DBE+LPDF, PCPs, PseDNC, KNN, EIIP, MMI, RFHCP yes 10-fold CV yes
4mCCNNb CNN Chen dataset/– MBE yesc 10-fold CV
4mCpred-ELd RF, GB, ERT, SVM (800 4mCs and 800 non-4mCs)/(180 4mCs and 180 non-4mCs) Kmer, DPE+LPDF, RFHC, EIIP, MBE, DPCP, TPCP yes 10-fold CV yes
i4mC-ROSEe RF (4854 4mCs and 4854 non-4mCs)/(1617 4mCs and 1617 non-4mCs) KSNC, MBE, EIIP yes 10-fold CV yes
2020 iEC4mC-SVMd SVM (388 4mCs and 388 non-4mCs)/(134 4mCs and 134 non-4mCs) MBE, RFHC, DAE, X-k-YCF, Kmer 10-fold CV
DNA4mC-LIPd –/Manavalan dataset integration of six existing predictors yes independent evaluation yes
4mcDeep-CBId CNN, BLSTM (1,173 4mCs and 6,635 non-4mCs)/ – same as used in 4mcPred-IFL 3-fold CV
iDNA-MSf RF 7,899 samples/7,898 samples Kmer, RFHCP, MBE yes 5-fold CV yes
i4mC-Moused RF (746 4mCs and 746 non-4mCs)/(160 4mCs and 160 non-4mCs) Kmer, KSNC, MBE, EIIP yes 10-fold CV yes

Chen dataset contains C. elegans (4mCs, 1,554; non-4mCs, 1,554), D. melanogaster (4mCs, 1,769; non-4mCs, 1,769), A. thaliana (4mCs, 1,978; non-4mCs, 1,978), E. coli (4mCs, 388; non-4mCs, 388), Geoa. subterraneus (4mCs, 906; non-4mCs, 906), and Geob. pickeringii (4mCs, 569; non-4mCs, 569). Manavalan dataset contains C. elegans (4mCs, 750; non-4mCs, 750), D. melanogaster (4mCs, 1,000; non-4mCs, 1,000), A. thaliana (4mCs, 1,250; non-4mCs, 1,250), E. coli (4mCs, 134; non-4mCs, 134), Geoa. subterraneus (4mCs, 350; non-4mCs, 350), and Geob. pickeringii (4mCs, 200; non-4mCs, 200). SVM, support vector machine; RF, random forest; GB, gradient boosting; CNN, convolutional neural network; BLSTM, bidirectional long short-term memory network; ERT, extremely randomized tree; RFHCP, ring-function-hydrogen-chemical properties, PSTNP, position-specific trinucleotide propensity; EIIP, electron-ion interaction pseudopotential; Kmer, Kmer nucleotide frequency; MBE, mononucleotide binary encoding, DBE, dinucleotide binary encoding, LPDF, local position-specific dinucleotide frequency; DPE, dinucleotide binary profile encoding; DPCP, dinucleotide physicochemical properties; TPCP, trinucleotide physicochemical properties; PCP, physicochemical property; PseDNC, pseudo-dinucleotide composition; KNN, K-nearest neighbor; KSNC, k-space nucleotide composition; DAC, dinucleotide physicochemical properties autocorrelation; X-k-YCF, Xmer-kGap-Ymer composition frequency; ANF, accumulated nucleotide frequency; LOOCV, leave-one-out cross-validation; CV, cross-validation.

b

Tools contain six species-specific prediction models, namely A. thaliana, C. elegans, D. melanogaster, E. coli, Geoa. subterraneus, and Geob. pickeringii.

c

Web server is not functional.

d

Tool contains one prediction model to compute 4mC site from specific species.

e

Tool contains two prediction models for F. vesca and Rosa chinensis.

f

Tool contains four different species-specific models, namely F. vesca, Casuarina equisetifolia, Saccharomyces cerevisiae, and Ts. SUP5-1.