Skip to main content
. 2016 Feb 24;6:21383. doi: 10.1038/srep21383

Table 4. Statistical analysis of site non-optimality for protein crystallizability engineering using the independent test dataset for the CRYs class (sequence redundancy removed at 25% sequence identity).

Ci threshold Alla(645599) Secondary structure
Disorder
Buried/Exposed
Side chain entropyb
Coil(296571) Helix(260752) Sheet(88276) Disorder(66129) Order(536962) Exposed(275189) Buried(370410) SCE(96423) SCE_E(74506) SCE_B(21917)
Ci > 0.005 52.2% 49.8% 56.1% 49.1% 57.3% 52.3% 51.1% 53.2% 36.3% 38.0% 30.4%
Ci > 0.010 32.3% 30.2% 35.4% 29.8% 39.3% 32.2% 31.7% 32.7% 21.4% 22.9% 16.3%
Ci > 0.02 15.3% 14.2% 16.9% 14.6% 22.7% 15.1% 15.9% 15.0% 10.6% 11.6% 7.01%
Ci > 0.05 4.08% 3.92% 4.36% 3.83% 8.80% 3.76% 4.65% 3.67% 3.37% 3.80% 1.90%
Ci > 0.1 1.22% 1.27% 1.23% 1.02% 3.83% 0.99% 1.53% 0.99% 1.26% 1.44% 0.62%
Ci > 0.2 0.33% 0.41% 0.27% 0.22% 1.64% 0.19% 0.46% 0.23% 0.43% 0.48% 0.24%
Charged Amino acids Hydrophobic Sequence locic
Negative(73537) Positive(83209) Charged(156745) Low(196812) Middle(164236) High(284551) N-terminal (36180) Intermediate(573239) C-terminal(36180)
35.4% 55.4% 46.0% 49.6% 51.7% 54.3% 67.5% 50.6% 62.3%
22.7% 36.0% 29.7% 31.2% 30.3% 34.1% 48.9% 30.4% 45.6%
12.4% 19.0% 15.9% 15.7% 13.9% 16.0% 29.3% 13.7% 28.1%
4.45% 6.01% 5.28% 4.42% 3.81% 4.01% 10.8% 3.18% 11.8%
1.62% 2.35% 2.00% 1.35% 1.30% 1.08% 3.76% 0.78% 5.66%
0.56% 0.95% 0.77% 0.36% 0.47% 0.23% 0.90% 0.14% 2.75%

aThe dataset contains 2,342 proteins comprising of 1,814 proteins currently classified as non-crystallizable. Residue numbers for different groups are shown in brackets.

bStatistical analysis of side-chain entropy considered three residues with high conformational entropies (KQE). SCE denotes the number of KQE residues in the entire sequence, while SCE_E and SCE_B denote the numbers of KQE residues annotated to be localized to exposed or buried regions, respectively.

cN-terminal and C-terminal denote the initial and final 20 residues located at the N- or C-terminal region of protein sequences. The Intermediate group is comprised of all residues from protein sequences, excluding N-terminal and C-terminal residues.