Skip to main content
. 2016 Aug 4;11(8):e0159644. doi: 10.1371/journal.pone.0159644

Table 7. Error analysis: average feature similarity for error cases on Naïve Bayes.

Caenorhabditis Danio rerio Drosophila Escherichia coli Zea mays
Feature FP FN FP FN FP FN FP FN FP FN
#Instances 1644 72 2167 39 13879 4844 161 9 390 66
Description 0.322 0.320 0.293 0.372 0.250 0.515 0.147 0.172 0.216 0.428
Literature 0.115 0.027 0.440 0.243 0.031 0.471 0.003 0.000 0.013 0.232
Length 0.191 0.567 0.165 0.659 0.143 0.704 0.151 0.556 0.207 0.720
Identity 0.936 0.902 0.954 0.902 0.974 0.854 0.983 0.924 0.962 0.866
AP 0.015 0.018 0.008 0.032 0.027 0.060 0.037 0.167 0.054 0.277
Expect_Value 0.012 0.109 0.019 0.031 0.168 0.365 0.037 0.020 0.055 0.001
CDS_Identity 0.881 0.882 0.924 0.888 0.893 0.852 0.906 0.921 0.868 0.840
CDS_AP 0.018 0.022 0.006 0.032 0.020 0.072 0.022 0.146 0.009 0.413
CDS_Expect 0.458 0.348 0.596 0.299 1.126 0.36 0.753 0.589 0.614 0.056
TRS_Identity 0.403 0.512 0.392 0.345 0.426 0.424 0.430 0.548 0.540 0.840
TRS_AP 0.020 0.042 0.020 0.408 0.032 0.130 0.030 0.262 0.027 0.463
TRS_Expect 2.456 1.312 1.630 0.408 2.061 1.404 1.799 0.144 3.227 0.257

#Instances: number of instances; FP: false positives, distinct pairs classified as duplicates; FN: false negatives, duplicates classified as distinct pairs; Feature names are explained in Table 3; Numbers are averages, excluding pairs not have specific features.