. 2016 Oct 31;8:60. doi: 10.1186/s13321-016-0173-z

Table 2.

Overview of results

Selection of fragments	Intepretable fragments	Fast processing (low num. features)	Best performance
Selection of fragments	Intepretable fragments	Fast processing (low num. features)	RF	SVM	NB
Unprocessed	Yes	–	Yes	Yes	–
Folded	–	Yes	–	–	–
Filtered	Yes	Yes	Yes	–	Yes

Unprocessed fragments yield random forest (RF) models and support vector machine (SVM) models with good performance and retain interpretability, but require a high computational cost. Folded fragments allow fast processing, but generate inferior models and are non-interpretable due to bit collisions. Filtered fragments yield the best naive Bayes (NB) models and can be employed to build RF models that are equally good as those built with unprocessed fragments. Filtered fragments also retain interpretability and allow fast processing

In summary, unprocessed (all) fragments are a good option if there are enough computational resources to optimize SVMs and the vast amount of (often redundant) features does not hinder interpreting predictions. Otherwise, filtered fragments should be preferred

In general, RF models yield good results without parameter tuning, however, SVM models are usually better when their parameters have been optimized (see section on parameter optimization)