Skip to main content
. 2020 May 27;7:e7954. doi: 10.7717/peerj.7954

Table 2. A sample of the quantitative diversity analysis for HA protein of avian Influenza A (H5N1) subtype.

The full data is available in Table S2.1

Protein Aligned nonamers H(x)c Indexd Variantse Nonatypesi
Positiona No.b Sequence [%] Total Majorf Minorg Uniqueh
%
HA # 158–166 2353 0.6 SFFRNVVWL 92 8 5 2 <1 1
& 159–167 2352 1.1 FFRNVVWLI 83 17 8 9 1 1
& 160–168 2353 1.1 FRNVVWLIK 83 17 7 9 1 1
& 161–169 2352 0.9 RNVVWLIKK 88 12 7 4 1 1
& 162–170 2351 1.7 NVVWLIKKN 64 36 23 12 1 2
& 163–171 2351 2.8 VVWLIKKNS 30 70 26 43 1 2
& 164–172 2350 3.5 VWLIKKDNA 21 79 20 57 1 3
& 165–173 2350 3.4 WLIKKDNAY 21 79 21 57 1 3
& 166–174 2350 3.4 LIKKDNAYP 21 79 21 57 1 3
& 167–175 2349 3.5 IKKDNAYPT 21 79 21 57 2 4
& 168–176 2349 3.2 KKDNAYPTI 24 76 21 54 1 3
& 169–177 2349 3.2 KDNAYPTIK 24 77 21 54 1 3
+ 170–178 2350 4.1 NSTYPTIKR 18 82 14 66 2 4

Notes.

1

All percentages are shown to the nearest whole number.

a

Amino acid number at the start and end of the nonamer position in the protein alignment. The symbol # denotes a highly conserved nonamer position (index incidence ≥ 90%), & denotes a mixed-variable position (index incidence between 90% & 20%), and + denotes a highly diverse nonamer position (index incidence ≤ 20%). See Fig. S1 for the definition of diversity motifs.

b

Total number of protein sequences analysed at the aligned nonamer position; the difference in number between the nonamer positions was due to the inclusion of both partial and full-length sequences in the alignment.

c

Shannon’s nonamer entropy, which indicates the level of diversity of the nonamer sequences at the position (H(x); see Fig. 1 for details).

d

The index nonamer is the most prevalent sequence at the position.

e

Variants are nonamer sequences that differ by one or more amino acids from the index sequence.

f

The major variant is the second most common sequence at the position.

g

Minor variants are multiple different repeated nonamer sequences, each occurring more than once and with an incidence of less than or occasionally equal to the major variant.

h

Unique variants are nonamer sequences that are observed only once at the position.

i

Nonatypes are distinct sequences among the variants for a given position; for example, the position 170–178 had 103 distinct nonamer sequences (Data S1; from a total of 2,350 sequences), therefore the percentage of nonatypes for this position is ∼4.4% ((103/2,350) ×100%).