. 2020 May 27;7:e7954. doi: 10.7717/peerj.7954

Table 2. A sample of the quantitative diversity analysis for HA protein of avian Influenza A (H5N1) subtype.

The full data is available in Table S2.¹

Protein	Aligned nonamers			H(x)^c	Index^d		Variants^e				Nonatypesⁱ
		Position^a	No.^b		Sequence	[%]	Total	Major^f	Minor^g	Unique^h
							%
HA	#	158–166	2353	0.6	SFFRNVVWL	92	8	5	2	<1	1
	&	159–167	2352	1.1	FFRNVVWLI	83	17	8	9	1	1
	&	160–168	2353	1.1	FRNVVWLIK	83	17	7	9	1	1
	&	161–169	2352	0.9	RNVVWLIKK	88	12	7	4	1	1
	&	162–170	2351	1.7	NVVWLIKKN	64	36	23	12	1	2
	&	163–171	2351	2.8	VVWLIKKNS	30	70	26	43	1	2
	&	164–172	2350	3.5	VWLIKKDNA	21	79	20	57	1	3
	&	165–173	2350	3.4	WLIKKDNAY	21	79	21	57	1	3
	&	166–174	2350	3.4	LIKKDNAYP	21	79	21	57	1	3
	&	167–175	2349	3.5	IKKDNAYPT	21	79	21	57	2	4
	&	168–176	2349	3.2	KKDNAYPTI	24	76	21	54	1	3
	&	169–177	2349	3.2	KDNAYPTIK	24	77	21	54	1	3
	⁺	170–178	2350	4.1	NSTYPTIKR	18	82	14	66	2	4

Notes.

All percentages are shown to the nearest whole number.

Amino acid number at the start and end of the nonamer position in the protein alignment. The symbol # denotes a highly conserved nonamer position (index incidence ≥ 90%), & denotes a mixed-variable position (index incidence between 90% & 20%), and + denotes a highly diverse nonamer position (index incidence ≤ 20%). See Fig. S1 for the definition of diversity motifs.

Total number of protein sequences analysed at the aligned nonamer position; the difference in number between the nonamer positions was due to the inclusion of both partial and full-length sequences in the alignment.

Shannon’s nonamer entropy, which indicates the level of diversity of the nonamer sequences at the position (H(x); see Fig. 1 for details).

The index nonamer is the most prevalent sequence at the position.

Variants are nonamer sequences that differ by one or more amino acids from the index sequence.

The major variant is the second most common sequence at the position.

Minor variants are multiple different repeated nonamer sequences, each occurring more than once and with an incidence of less than or occasionally equal to the major variant.

Unique variants are nonamer sequences that are observed only once at the position.

ⁱ

Nonatypes are distinct sequences among the variants for a given position; for example, the position 170–178 had 103 distinct nonamer sequences (Data S1; from a total of 2,350 sequences), therefore the percentage of nonatypes for this position is ∼4.4% ((103/2,350) ×100%).