Skip to main content
. 2016 Aug 4;11(8):e0159644. doi: 10.1371/journal.pone.0159644

Table 6. Ablation study of record features for duplicate classification.

Organism Meta Seq SQ SQC SQM All
Pre Rec Pre Rec Pre Rec Pre Rec Pre Rec Pre Rec
Caenorhabditis
Naïve Bayes 0.633 0.628 0.714 0.714 0.872 0.833 0.849 0.808 0.899 0.880 0.852 0.809
Decision tree 0.815 0.730 0.816 0.814 0.971 0.971 0.979 0.979 0.980 0.980 0.981 0.981
Danio
Naïve Bayes 0.656 0.622 0.696 0.657 0.817 0.766 0.839 0.775 0.831 0.797 0.839 0.777
Decision tree 0.815 0.730 0.816 0.814 0.971 0.971 0.979 0.979 0.980 0.980 0.958 0.958
Drosophila
Naïve Bayes 0.945 0.941 0.719 0.718 0.860 0.827 0.882 0.849 0.973 0.973 0.983 0.983
Decision tree 0.951 0.950 0.950 0.950 0.996 0.996 0.998 0.998 0.999 0.999 0.999 0.999
Escherichia
Naïve Bayes 0.778 0.654 0.842 0.820 0.979 0.979 0.937 0.930 0.972 0.972 0.927 0.918
Decision tree 0.719 0.717 0.842 0.836 0.982 0.982 0.981 0.981 0.981 0.981 0.981 0.981
Zea
Naïve Bayes 0.894 0.881 0.882 0.855 0.987 0.986 0.987 0.986 0.984 0.984 0.986 0.986
Decision tree 0.961 0.960 0.965 0.965 0.997 0.997 0.998 0.998 0.998 0.998 0.998 0.998

Pre: average precision for two classes (DU and DI); Rec: average recall; Meta: meta-data features; Seq: sequence identity and length ratio; Q: alignment quality related features, such as Expect_value; SQ: combination for Seq with Q; C: coding regions related features, such as CDS_identity; SQC: combination for Seq, Q and C; SQM: Seq, Q and Meta All: all eatures.