Skip to main content
. 2013 Jun 24;8(6):e66341. doi: 10.1371/journal.pone.0066341

Table 3. Unsupervised features were as powerful as expert-engineered features in distinguishing uric acid sequences from gout vs. leukemia.

Classifier AUC (training) AUC [CI] (test)
First-Layer Learned Features 0.969 0.972 [0.968, 0.979]
Second-Layer Learned Features 0.965 0.972 [0.968, 0.979]
Expert Engineered Features 0.968 0.974 [0.966, 0.981]
Baseline (sequence mean only) 0.922 0.932 [0.922, 0.944]

The second column gives the performance of an Elastic Net model under cross-validation on the training set. The third column gives the performance on the held-out test set, with 95% confidence intervals determined using the bias-corrected and accelerated bootstrap. The nearly identical overlap of the confidence intervals indicates that the classifiers built from each of the two learned feature layers and the expert-engineered feature set were equally useful in the supervised learning task. Likewise, the 0.04 difference in performance between the baseline model and the other three is both statistically significant and a respectable improvement as supervised models go. AUC: Area under the Receiver Operating Characteristic curve. CI: 95% Confidence Interval.