. 2005 Mar;14(3):582–592. doi: 10.1110/ps.041009005

Table 2.

Percentage of proteins in various data sets with different intrinsic potentials for solubility on overexpression

	Percentage of proteins
	SI-based predictions		WH scheme predictions^a
Data set	High intrinsic potential	Low intrinsic potential	Soluble	Insoluble
S	100	0	32	68
I	30^b	70	22	78
T-soluble	60	40	13	87
T-insoluble	36	64	28	72

^a By Davis et al. (1999); The solubility of the protein is predicted based on a canonical value CV, calculated CV = [15.43 × (N + G + P + S) ÷ n] -[(29.56 × |(R + K) - (D + E) ÷ n) - 0.03|] where N, G, P, S, R, K, D, and E represent the number of Asn, Gly, Pro, Ser, Arg, Lys, Asp, and Glu residues in the protein, respectively, and n is the number of amino acids in the protein. If the difference between CV and CV′ (a discriminate whose value has been set to 1.71) is positive, the protein is predicted to be insoluble, and if the difference is negative, the protein is predicted to be soluble.

^b These proteins may be regarded as potential candidates for mutations studies for enhancing solubility.