Table 1. Summary of feature columns and fragment sizes.
Feature Name | No. of columns | No. of residues per fragment window |
---|---|---|
21-bit Sparse Coding | 21 per residue | 11 sequential residues (a sliding window of size 5) |
GAC | 20 per fragment | whole protein sequence |
PSSM | 20 per residue | 11 sequential residues |
EC | 1 per residue | 11 sequential residues |
LN | 5 per residue | 11 sequential residues |
Normalized ASAs | 3 per residue | 11 spatial residues (a neighboring window of size 10) |
Physicochemical (PC) property | 10 per fragment | 21 spatial residues (a neighboring window of size 20) |
Predicted secondary structure | 8 per residue | 11 sequential residues |
Since the GAC is calculated from a single protein sequence, for each coding fragment, a GAC vector will be appended. For the PC feature, for a coding fragment a list of 21 neighboring residues will return 10 values.