Repeat II in the Cag7 protein extends continuously from amino acids 477-1383. The sequences of α, β, δ, ɛ, λ, and μ are aligned, and the consensus sequences are displayed at the top. Residues that appear the same number of times at one position both are displayed in the consensus sequence indicated by a colon. Note that the sequences of α, β, λ, and μ start with a cysteine. Lowercase letters represent nonaligned residues. The ★ underneath the K locates the terminal point of
ORF14 in cosmid 36 and the @ underneath the m locates the start point of
ORF13 in cosmid 36. The conservation index (defined below) among the sequences of α is 0.82; among the sequences of β, 0.79; among the sequences of δ, 0.81; among the sequences of ɛ, 0.60; among the sequences of λ, 0.68; and among the sequences of μ, 0.78. The conservation index (
10) provides a means to quantitate similarity among aligned sequences. A similarity score between a pair of amino acids is determined according to a similarity substitution matrix, say
blosum 62 (
11). Normalized scores for an amino acid pair (
a and
b) are calculated by the formula
where
S(
a,
b),
S(
a,
a),
S(
b,
b) are similarity values given by the
blosum 62 matrix. For each position (column) of these sequences, the conservation index is calculated by taking the average normalized score from all residue pairs at that position.