Table 1. Prediction error of GLM (glm.*) and GAM (gam.*) models on training and test data sets.
Model | MSE (Train) | MRD (Train) | MSE (Test) | MRD (Test) |
glm.full | 0.032 | 0.277 | 0.082 | 0.511 |
glm.step | 0.030 | 0.278 | 0.081 | 0.499 |
glm.bits | 0.032 | 0.283 | 0.082 | 0.497 |
gam.full | 0.030 | 0.264 | 0.085 | 0.534 |
gam.step | 0.030 | 0.274 | 0.081 | 0.498 |
gam.bits | 0.032 | 0.281 | 0.082 | 0.497 |
Results are shown for models with three sets of predictor variables: 1) full models which contain all BLAST statistics (full), 2) stepwise models which contain BLAST statistics selected during using a stepwise AIC variable selection process (step), and 3) bits models only utilize the BLAST bit score (log) as a predictor variable (bits). Models were assessed using both MSE and MRD (lower values are better). On the training set, models with all predictor variables (glm.full, and gam.full) fit the data best (MSE = 0.032 and 0.030 and MRD = 0.277 and 0.264 respectively). However, models with more predictor variables do not perform significantly better on the test data versus models which have bit score as a single predictor.