Table 7.
The effect of additional similar sequences in a family, on the performance of AMPS applied to the Master Data Set. Clustering was performed on NAS instead of SD for efficiency with large alignments. AccSCR (Master Data Set): Accuracy for AMPS clustered on Normalised Alignment Score (NAS) for the Master data set. AccSCR (Extended data set): Accuracy for alignments on the data set with additional sequences. p : Wilcoxon Signed Rank Pair test significance
| PID Average 1 | Number of Families 2 | AccSCR (Master Data set) 3 | AccSCR (Extended data set) 4 | Difference in Accuracy (4–3) | p |
| 0–10 | 21 | 21.7 | 35.3 | 13.6 | 0.00947 |
| 10–20 | 57 | 59.4 | 63.3 | 3.9 | 0.0719 |
| 20–30 | 64 | 81.2 | 82.6 | 1.4 | 0.283 |
| 30–50 | 175 | 92.2 | 92.2 | 0.0 | 0.899 |
| 50–100 | 355 | 98.9 | 98.8 | -0.1 | 0.00255 |
| Total | 672 | 89.7 | 90.5 | 0.8 | 0.238 |