. 1997 Nov 11;94(23):12562–12567. doi: 10.1073/pnas.94.23.12562

Table 2.

Keywords (main patterns) in the human heavy and κ chains

The results of the statistical analysis of words in a, b, and c levels of clusters are presented for 17 fragments of the human heavy and κ chains in the Kabat database; analogous data for germ-line database sequences is given in parentheses. We illustrate the use of Table 2 on the example of BC fragment of the heavy chains. The Kabat numbers (25–28, in the first row) correspond to indices BC1, BC2, BC3, and BC4. Three keywords: SG(FY)X; SGG(ST), and SGDS serve as a basis for dividing all words of BC fragment into three clusters. The words of these clusters are found in 97% of all chains and cover 80% of different words (last two columns—Total). Six middle columns show the statistics for three a, b, and c levels of each cluster. The a columns list the numbers of different words (dwords) and numbers of chains (chains) containing all words of the a level of each of the three clusters. b columns contain the data for b levels; and the number of different words and the total number of chains in each of clusters are presented in c columns. Thus, in a level of the first cluster, there are 19 different words (six in germ-line sequences), which are found in 533 chains (64 in germ-line sequences). Total number of chains in which were found all words of the first cluster (551) and total number of different words in a, b, and c levels (34) are given in c columns (respective data for germ-line sequences are 64 and 6). The superscripts refer to Comment to Table 2 section of the paper.