Table 3.
Alignment Accuracy | ||||||
---|---|---|---|---|---|---|
[ AUC for F measure (%)] | ||||||
Method | Global homology set | Local homology set | ||||
parameterization | parameterization | |||||
short | long | optimal | short | long | optimal | |
e2msa.afg | 71.4 | 80.4 | 80.3 | 68.2 | 68.2 | 73.6 |
e2msa.aga | 71.4 | 80.4 | 80.1 | 68.2 | 67.3 | 73.6 |
e2msa.aif | 71.3 | 80.4 | 80.2 | 68.1 | 68.3 | 73.3 |
e2msa.tkf92 | 71.2 | 80.0 | 79.9 | 68.1 | 68.2 | 73.4 |
e2msa.afr | 71.7 | 80.0 | 79.8 | 68.1 | 68.2 | 73.3 |
e2msa.aali | 71.0 | 78.7 | 78.6 | 67.9 | 66.4 | 72.7 |
e2msa.tkf91 | 69.5 | 75.4 | 74.5 | 66.2 | 69.1 | 70.7 |
PHMMER (no filters) SSEARCH | 78.7 | 72.9 | ||||
(BLOSUM62, -11/-1) | 80.0 | 71.7 | ||||
NCBIBLAST | 78.9 | 68.4 | ||||
MSAProbs | 81.7 | NA | ||||
MUSCLE | 80.8 | NA |
The “Global Homology set” is the one used in Fig. 7. The “Local Homology set” is the one used in Fig. 8. The e2msa algorithm was run in local mode, and with three different parameterizations: two at a fixed branch length (a short-branch and a long-branch parameterization, introduced in Fig. 7), and a variable optimal-time parameterization that uses for each homology the branch length that optimizes the probability of the sequences given the model. The rate parameters for all evolutionary model were obtained using the same training set “Pfam.seed.S1000.sto”. For all experiments, alignments are binned in 5 % identity groups, and the total F measure for one bin is calculated adding all alignments in that bin. In order to provide one single number, we report the area under the curve (AUC) for the F measure of alignments covering all identity ranges. For comparison, we provide results for other standard methods. Methods have been ranked by their combined performance in both sets. Methods such as MSAProbs and MUSCLE work only in “global” alignment mode, and they are not appropriate to detect local homologies
In bold, we indicate the best performing of the three alternative parameterizations