Table 1. Performance of composition-adjusted substitution matrices.
Organisms compared
|
No. of sequence pairs
|
Mean BLOSUM-62 bit score*
|
Background frequencies specified
|
Median change in bit score* with respect to BLOSUM-62
|
Cases improved (%)
|
Cases (%) with statistical significance improved/worsened by a factor >10†
|
||
---|---|---|---|---|---|---|---|---|
Sequence pairs | Absolute | Relative (%) | ||||||
Related | C. tetani and M. tuberculosis | 40 | 68.3 | Organism | +1.6 | +2.7 | 58 | 20/8 |
Sequence‡ | +2.3 | +3.3 | 85 | 38/3 | ||||
B. subtilis and L. lactis | 37 | 59.8 | Organism | +1.1 | +1.8 | 84 | 16/3 | |
Sequence‡ | +2.1 | +3.6 | 95 | 11/3 | ||||
M. tuberculosis and S. coelicolor | 34 | 58.6 | Organism | +1.4 | +2.6 | 76 | 24/3 | |
Sequence‡ | +2.7 | +4.1 | 100 | 32/0 | ||||
Unrelated (negative control) | C. tetani and M. tuberculosis | 1,560 | 16.7 | Organism | -0.02 | -0.1 | 49 | 0.4/0.1 |
Sequence‡ | -0.05 | -0.3 | 47 | 0.6/0.4 | ||||
B. subtilis and L. lactis | 1,332 | 15.7 | Organism | +0.00 | +0.0 | 50 | 0.0/0.0 | |
Sequence‡ | +0.04 | +0.3 | 52 | 0.2/0.4 | ||||
M. tuberculosis and S. coelicolor | 1,122 | 16.4 | Organism | +0.05 | +0.3 | 53 | 0.0/0.1 | |
Sequence‡ | +0.06 | +0.4 | 53 | 0.6/0.2 | ||||
Structural | Various | 32 | 50.4 | Sequence‡ | +1.3 | +3.2 | 72 | 22/0 |
Bit scores for all comparisons were calculated by using composition-based statistics (19), and experimentally determined gapped statistical parameters (18, 19), as is now standard in blast (12, 13). All matrices were scaled to have ungapped λ = 0.00635 and used in conjunction with gap costs of -550 -50k for a gap of length k.
Equivalent to a change of >3.322 bits.
Twenty pseudocounts proportional to the amino acid frequencies implicit in BLOSUM-62 were added to the actual amino counts from the proteins compared.