Skip to main content
. 2005 Mar;14(3):582–592. doi: 10.1110/ps.041009005

Table 2.

Percentage of proteins in various data sets with different intrinsic potentials for solubility on overexpression

Percentage of proteins
SI-based predictions WH scheme predictionsa
Data set High intrinsic potential Low intrinsic potential Soluble Insoluble
S 100 0 32 68
I 30b 70 22 78
T-soluble 60 40 13 87
T-insoluble 36 64 28 72

a By Davis et al. (1999); The solubility of the protein is predicted based on a canonical value CV, calculated CV = [15.43 × (N + G + P + S) ÷ n] -[(29.56 × |(R + K) - (D + E) ÷ n) - 0.03|] where N, G, P, S, R, K, D, and E represent the number of Asn, Gly, Pro, Ser, Arg, Lys, Asp, and Glu residues in the protein, respectively, and n is the number of amino acids in the protein. If the difference between CV and CV′ (a discriminate whose value has been set to 1.71) is positive, the protein is predicted to be insoluble, and if the difference is negative, the protein is predicted to be soluble.

b These proteins may be regarded as potential candidates for mutations studies for enhancing solubility.