Table 2.
Selection of fragments | Intepretable fragments | Fast processing (low num. features) | Best performance | ||
---|---|---|---|---|---|
RF | SVM | NB | |||
Unprocessed | Yes | – | Yes | Yes | – |
Folded | – | Yes | – | – | – |
Filtered | Yes | Yes | Yes | – | Yes |
Unprocessed fragments yield random forest (RF) models and support vector machine (SVM) models with good performance and retain interpretability, but require a high computational cost. Folded fragments allow fast processing, but generate inferior models and are non-interpretable due to bit collisions. Filtered fragments yield the best naive Bayes (NB) models and can be employed to build RF models that are equally good as those built with unprocessed fragments. Filtered fragments also retain interpretability and allow fast processing
In summary, unprocessed (all) fragments are a good option if there are enough computational resources to optimize SVMs and the vast amount of (often redundant) features does not hinder interpreting predictions. Otherwise, filtered fragments should be preferred
In general, RF models yield good results without parameter tuning, however, SVM models are usually better when their parameters have been optimized (see section on parameter optimization)