Table 2. Dataset features.
A description of the features of the dataset used in this study.
| Feature (s) | Number | Binary | Description |
|---|---|---|---|
| Cosine similarity | 1 | No | Cosine similarity of amino acid profiles in positions i and j. |
| Correlation measure | 1 | No | Correlation measure of amino acid profiles in positions i and j. |
| Mutual information | 1 | No | Mutual information of amino acid profiles in positions i and j. |
| Amino acid types | 10 | Yes | Gives all types of amino acid in pair among nonpolar, polar, acidic, and basic. |
| Levitt’s contact potential | 1 | No | Amino acid pair energy measure. |
| Jernigan’s pairwise potential | 1 | No | Amino acid pair energy measure. |
| Braun’s pairwise potential | 1 | No | Amino acid pair energy measure. |
| MSA amino acid profiles | 483 | No | Profile of each of the 20 amino acids, plus gap, in the 18 sliding window positions and five central segment positions. |
| MSA entropy | 23 | No | Profile entropy of each of the 18 sliding window positions and five central segment positions. |
| Solvent accessibility | 46 | Yes | Solvent accessibility of the amino acid (buried or exposed) of each of the 18 sliding window positions and five central segment positions. |
| Secondary structure | 69 | Yes | Secondary structure of the amino acid (helix, sheet, or coil) of each of the 18 sliding window positions and five central segment positions. |
| Central segment amino acid compositions | 21 | No | Overall proportions of each of the 20 amino acids, plus gap, across all central segments. |
| Central segment secondary structure compositions | 3 | No | Overall proportion of the three secondary structures across the central segments. |
| Central segment solvent accessibility compositions | 2 | No | Overall proportion of the two solvent accessibilities across the central segments. |
| Amino acid sequence separation | 16 | Yes | Amino acid sequence separation using bins <6, 6, 7, 8, 9, 10, 11, 12, 13, 14, <19, <24, ≤29, ≤39, ≤49, and ≥50. |
| Protein secondary structure composition | 3 | No | Overall secondary structure composition of the protein of the contact pair. |
| Protein length | 4 | Yes | Length of the protein of the contact pair using bins ≤50, ≤100, ≤150, >150. |
| Protein solvent accessibility composition | 2 | No | Overall solvent accessibility composition of the protein of the contact pair. |