Table 5. The estimation of β using α.
Matrix |
BLOSUM-45 |
BLOSUM-62 |
BLOSUM-80 |
PAM-70 |
PAM-30 |
αu | 0.9113 | 0.7916 | 0.5222 | 0.3250 | 0.1938 |
βu | –5.7 | –3.2 | –1.6 | –0.7 | –0.3 |
Gap existence | 14 | 11 | 10 | 10 | 9 |
Gap extension | 2 | 1 | 1 | 1 | 1 |
αg | 1.92 0.03 | 1.90 0.02 | 1.07 0.02 | 0.70 0.01 | 0.48 0.01 |
βg | –37.2 1.6 | –29.7 1.0 | –12.5 0.8 | –8.1 0.5 | –5.9 0.3 |
2G(αu – αg) + βu | –38.0 1.0 | –29.8 0.5 | –13.7 0.4 | –9.0 0.3 | –6.0 0.2 |
Estimates for α and β were obtained by linear regression of alignment length versus score, for scores attaining at least a cutoff value. Sequences were generated using a set of standard amino acid frequencies (26). Substitution scores were from either the BLOSUM (27) or PAM (37,38) series. Affine gap scores charged an existence penalty for each gap, and an extension penalty for each residue within a gap. Cutoff scores were chosen sufficiently high to avoid detectable bias in estimating α. Sufficient data points were generated to estimate α with a standard error of <2%. For each scoring system studied, this required over 500 pairs of random sequences, recording islands anchored within the central 5000 × 5000 square of each pairwise comparison. Borders of length 1000 were used for the BLOSUM-45 and BLOSUM-62 scoring systems, and of length 500 for all others.