Table 2. A sample of the quantitative diversity analysis for HA protein of avian Influenza A (H5N1) subtype.
Protein | Aligned nonamers | H(x)c | Indexd | Variantse | Nonatypesi | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Positiona | No.b | Sequence | [%] | Total | Majorf | Minorg | Uniqueh | ||||
% | |||||||||||
HA | # | 158–166 | 2353 | 0.6 | SFFRNVVWL | 92 | 8 | 5 | 2 | <1 | 1 |
& | 159–167 | 2352 | 1.1 | FFRNVVWLI | 83 | 17 | 8 | 9 | 1 | 1 | |
& | 160–168 | 2353 | 1.1 | FRNVVWLIK | 83 | 17 | 7 | 9 | 1 | 1 | |
& | 161–169 | 2352 | 0.9 | RNVVWLIKK | 88 | 12 | 7 | 4 | 1 | 1 | |
& | 162–170 | 2351 | 1.7 | NVVWLIKKN | 64 | 36 | 23 | 12 | 1 | 2 | |
& | 163–171 | 2351 | 2.8 | VVWLIKKNS | 30 | 70 | 26 | 43 | 1 | 2 | |
& | 164–172 | 2350 | 3.5 | VWLIKKDNA | 21 | 79 | 20 | 57 | 1 | 3 | |
& | 165–173 | 2350 | 3.4 | WLIKKDNAY | 21 | 79 | 21 | 57 | 1 | 3 | |
& | 166–174 | 2350 | 3.4 | LIKKDNAYP | 21 | 79 | 21 | 57 | 1 | 3 | |
& | 167–175 | 2349 | 3.5 | IKKDNAYPT | 21 | 79 | 21 | 57 | 2 | 4 | |
& | 168–176 | 2349 | 3.2 | KKDNAYPTI | 24 | 76 | 21 | 54 | 1 | 3 | |
& | 169–177 | 2349 | 3.2 | KDNAYPTIK | 24 | 77 | 21 | 54 | 1 | 3 | |
+ | 170–178 | 2350 | 4.1 | NSTYPTIKR | 18 | 82 | 14 | 66 | 2 | 4 |
Notes.
All percentages are shown to the nearest whole number.
Amino acid number at the start and end of the nonamer position in the protein alignment. The symbol # denotes a highly conserved nonamer position (index incidence ≥ 90%), & denotes a mixed-variable position (index incidence between 90% & 20%), and + denotes a highly diverse nonamer position (index incidence ≤ 20%). See Fig. S1 for the definition of diversity motifs.
Total number of protein sequences analysed at the aligned nonamer position; the difference in number between the nonamer positions was due to the inclusion of both partial and full-length sequences in the alignment.
Shannon’s nonamer entropy, which indicates the level of diversity of the nonamer sequences at the position (H(x); see Fig. 1 for details).
The index nonamer is the most prevalent sequence at the position.
Variants are nonamer sequences that differ by one or more amino acids from the index sequence.
The major variant is the second most common sequence at the position.
Minor variants are multiple different repeated nonamer sequences, each occurring more than once and with an incidence of less than or occasionally equal to the major variant.
Unique variants are nonamer sequences that are observed only once at the position.
Nonatypes are distinct sequences among the variants for a given position; for example, the position 170–178 had 103 distinct nonamer sequences (Data S1; from a total of 2,350 sequences), therefore the percentage of nonatypes for this position is ∼4.4% ((103/2,350) ×100%).