Skip to main content
. 2016 Oct 31;8:60. doi: 10.1186/s13321-016-0173-z

Table 2.

Overview of results

Selection of fragments Intepretable fragments Fast processing (low num. features) Best performance
RF SVM NB
Unprocessed Yes Yes Yes
Folded Yes
Filtered Yes Yes Yes Yes

Unprocessed fragments yield random forest (RF) models and support vector machine (SVM) models with good performance and retain interpretability, but require a high computational cost. Folded fragments allow fast processing, but generate inferior models and are non-interpretable due to bit collisions. Filtered fragments yield the best naive Bayes (NB) models and can be employed to build RF models that are equally good as those built with unprocessed fragments. Filtered fragments also retain interpretability and allow fast processing

In summary, unprocessed (all) fragments are a good option if there are enough computational resources to optimize SVMs and the vast amount of (often redundant) features does not hinder interpreting predictions. Otherwise, filtered fragments should be preferred

In general, RF models yield good results without parameter tuning, however, SVM models are usually better when their parameters have been optimized (see section on parameter optimization)